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I.   INTRODUCTION  AND  BACKGROUND 

Model  output  statistics  (MOS)  is  a  technique  whereby 
parameters  output  from  numerical  weather  prediction  models 
(predictors)  are  statistically  processed,  with  observed 
data,  to  produce  forecasts  of  one  of  the  following  cate- 
gories of  parameters  (as  predictands) : 

a.  operationally  important  parameters  not  output  by  the 
numerical  prediction  model  (e.g.,  visibility,  cloud 
cover,  ceiling) . 

b.  model  output  parameters  whose  predictive  skill  is 
improved  (e.g.,  surface  wind,  temperature)  due  to 
correction  of  numerical  model  bias  and/or  scale. 

Historically,  the  methodology  has  consisted  of  generating 
empirical  equations  by  a  linear,  least-squares  regression 
model.   This  technique  is  used  by  both  the  National  Weather 
Service  and  the  United  States  Air  Force  Air  Weather  Service 
and  has  demonstrated  operationally  usable  skill  in  forecast- 
ing numerous  weather  elements  at  locations  over  land 
throughout  the  world  [Best  and  Pryor,  1983].   Attempts  by 
the  United  States  Navy  to  forecast  open-ocean  fog  and  visi- 
bility using  linear  regression  equations  have  shown  skills 
of  marginal  operational  usefulness  but  exceeding  those  of 
persistence  and  climatology  [Aldinger,  19  79;  Yavorsky,  19  80; 
Selsor,  19  80;  Koziara  et  al,  19  83;  Renard  and  Thompson,  19  84] 
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Presumably,  this  level  of  performance  is  due,  in  part,  to 
the  lack  of  'calibrated'  fog  and  visibility  observations. 
Shipboard  weather  observers  lack  sufficient  reference  points 
to  be  able  to  accurately  estimate  the  range  of  atmospheric 
visibility. 

In  the  spring  of  19  83,  the  United  States  Navy  made  the 
decision  to  begin  development  of  a  MOS  program  to  forecast 
operational  air/ocean  parameters  over  the  oceans  of  the 
world.   Primarily,  because  of  the  importance  of  horizontal 
visibility  to  the  mariner,  this  parameter  was  elected  to  be 
the  initial  candidate.   However,  because  of  less-than-perf ect 
prior  results  using  linear  regression  in  the  North  Pacific 
Ocean,  it  was  decided  to  investigate  other  methodologies 
to  determine  if  a  better  one  could  be  found. 

This  study  presents  statistical  methodologies  proposed  by 
Preisendorfer  (1983  a,b,c).   Specifically,  three  strategies, 
two  based  on  maximum-probability  and  one  based  on  natural- 
regression,  are  further  developed,  tested  and  applied  to  sets 
of  model  output  parameters  from  both  the  North  Pacific  and 
North  Atlantic  Ocean  areas.   In  addition,  multiple  linear 
regression  is  applied  to  the  same  data.   Innovative  threshold 
techniques,  developed  by  Lowe  (19  84a) ,  are  also  applied,  and 
methodologies  are  compared. 

In  the  following  discussion,  a  sufficient  number  of  terms 
and  symbols  are  defined  to  allow  readers  without  strong 
statistical  backgrounds  to  understand  the  results.   However, 
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for  a  proper  understanding  of  the  Preisendorf er  (19  33  a ,13,0) 
methodology,  readers  are  encouraged  to  read  Appendix  A, 
which  contains  a  detailed  discussion.   Similarly,  details  on 
the  linear  regression  model  and  threshold  procedures  [Lowe, 
19  84a)  are  to  be  found  in  Appendix  B. 
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II.   OBJECTIVE  AND  APPROACH 

The  objective  of  this  study  is  to  determine  if  a  statis- 
tical methodology,  applied  to  discrete  values  of  model 
output  and  derived  parameters,  can  improve  upon  the  fore- 
casting of  horizontal  marine  atmospheric  visibility  when 
compared  to  linear  regression.   The  approach  is  as  follows: 

a.  define  categorical  groupings  of  visibility  which 
relate  to  operational  use  at  sea. 

b.  develop  and  apply  the  Preisendorf er  (1983  a,b,c) 
methodology  using  July  19  79  North  Pacific  Ocean  data. 

c.  apply  the  methodology  developed  in  b.  above  to  June 
19  83  North  Atlantic  Ocean  data. 

d.  compare  Preisendorfer  (1983  a,b,c)  results  to  those 
of  the  Lowe  (19  84a)  linear  regression  approach  for 
the  North  Pacific ,  and  North  Atlantic  Ocean  data  sets 
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III.   DATA 

A.   VISIBILITY  OBSERVATIONS  AND  SYNOPTIC  CODE 

Visibility  observations  at  sea  are  reported  as  one  of 
ten  synoptic  codes,  ranging  from  90  (visibility  less  than 
50  m)  to  99  (visibility  equal  to  or  greater  than  10  km) . 
However,  in  view  of  the  inexactness  of  observing  and  record- 
ing marine  visibility,  in  category  form,  and  the  further 
degradation  of  its  interpretation  by  users  in  forecasting, 
a  simplified  categorization  of  visibility  was  developed  as 
follows : 


category 

synoptic 

code 

visibility  range 

I 

90-94 

<  2  km 

II 

95-96 

>  2  km  and  <  10  km 

III 

97-99 

>  10  km 

This  scheme  is  based  upon  the  following  operational 
criteria,  which  applies  when  observed  visibility  falls  below 
the  indicated  value: 

1.  10  km  (5  n  mi) — United  States  Navy  aircraft  carrier 
flight  recovery  operations  change  from  visual  to  con- 
trolled approach  [Department  of  the  Navy,  19  79]. 

2.  2  km  (1  n  mi) — sounding  of  reduced  visibility  signals 
for  all  vessels  operating  in  international  waters. 
(The  term  'reduced  visibility'  is  not  defined  in  the 
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International  Regulations  for  Preventing  Collisions  at 
Sea,  1972.   However,  United  States  Navy  Captains  and 
Merchant  Marine  Masters  generally  consider  it  to  be 
1  n  mi  . ) 

B.  NORTH  PACIFIC  OCEAN  DATA 

The  data  from  the  North  Pacific  Ocean  are  described  by 
Selsor  (1980)  and  Koziara  et  al  (1983) .   Only  the  July  1979 
model  initialization  (TAU00)  data  are  used,  consisting  of  19 
model  output  parameters  (MOP)  from  the  Northern  Hemisphere 
models  operational  in  1979,  namely,  the  Mass  Structure  Analy- 
sis, the  Primitive  Equation  and  the  Marine  Wind  Models;  and 
one  climatological  visibility  parameter  from  the  National 
Oceanic  and  Atmospheric  Administration's  National  Climatic 
Data  Center  (NCDC) ,  Asheville,  North  Carolina.   Two  additional 
parameters  were  derived  from  this  set.   A  description  of  the 
parameters  is  found  in  Appendix  C. 

C.  NORTH  ATLANTIC  OCEAN  DATA 
1 .   Area 

The  North  Atlantic  Ocean,  from  0°  to  80 °N,  was 
divided  into  physically  homogeneous  areas  by  Lowe  (19  84b) 
using  an  appropriate  cluster  analysis  technique.   The  primary 
area  used  in  this  study  is  identified  as  area  3W  on  Fig.  1, 
which  illustrates  the  North  Atlantic  Ocean  homoegeneous  areas. 
This  area  was  chosen  because  of  the  relatively  frequent 
occurrence  of  poor  visibility  as  compared  to  the  other  areas. 
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A  summary  of  visibility  frequencies,  for  each  homogeneous 
area  and  three  visibility  categories,  is  contained  in  Table  I. 

2 .  Time  Period 

Data  from  15  May  19  83  through  15  July  19  83  were 
combined  to  form  the  June  19  83  data  set,  hereafter  referred 
to  as  FATJUNE.   FATJUNE  was  chosen  as  the  initial  data  set 
because  of  the  high  frequency  of  occurrence  of  poor  visi- 
bility during  this  period.   In  order  to  maximize  the  credi- 
bility of  visibility  observations,  1200  GMT  synoptic  ship 
report  data  were  used  exclusively  since  this  time  corresponds 
to  daylight  over  the  entire  area  of  study  during  FATJUNE. 

Model  output  parameter  data  (predictors)  at  120  0  GMT 
model  output  time,  hereafter  referred  to  as  TAU00,  were  used 
in  the  development  of  the  Preisendorf er  (1983  a,b,c)  methodology, 
time  not  being  available  to  pursue  the  scheme  beyond  that 
stage.   Thus,  TAU00  represents  model  initialization  time. 
However,  the  term  'forecast1  will  be  used  throughout  this 
study  to  represent  the  estimate  of  visibility  at  this 
initialization  time. 

3 .  Synoptic  Weather  Reports 

All  synoptic  visibility  observations  (predictand 
data)  for  this  study  were  quality-control  checked  and  pro- 
vided by  the  Naval  Oceanography  Command  Detachment  (NOCD) 
co-located  with  the  NCDC.   Those  furnished  observations  which 
contain  systematic  observer  error  or  are  suspect  or  obviously 
erroneous,  as  determined  from  the  data  quality  indicators, 
are  not  incorporated  in  the  final  data  set. 
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4 .   Predictor  Parameters 

Fifty  TAUOO  model  output  parameters  (MOP's)  (predic- 
tor data)  were  provided  for  the  period  of  study  by  the  Fleet 
Numerical  Oceanography  Center  (FNOC) ,  Monterey,  California. 
These  parameters  are  from  their  current  operational  prediction 
model,  the  Navy  Operational  Global  Atmospheric  Prediction 
System  (NOGAPS) .   All  MOP's  were  interpolated  from  model  grid 
coordinates  to  synoptic  ship  observation  positions  using  a 
linear  interpolation  scheme.   Of  the  50  parameters  provided, 
only  35  were  used  in  the  development  of  the  Preisendorfer 
(19  83  a,b,c)  and  Lowe  (19  84a)  methodologies,  the  remainder 
being  considered  as  either  having  little  likelihood  of 
importance  in  the  forecasting  of  visibility  or  not  usable 
due  to  the  lack  of  significant  digits  (which  were  lost  during 
the  transfer  from  FNOC  tapes  to  the  main  computer  center's 
mass  storage  data  system) .   Twelve  additional  parameters  were 
derived  from  the  interpolated  MOP's.   Seven  of  these  are 
equations  derived  from  a  linear  regression  model  which  will 
be  described  in  Chapter  V  and  Appendix  B.   Each  equation 
represents  an  estimate  of  the  visibility  category,  which  is 
used  as  a  predictor.   A  list  of  all  of  the  predictor  param- 
eters is  provided  in  Appendix  D. 

D.   DEPENDENT/INDEPENDENT  DATA  SETS 

Due  to  the  limited  amount  of  data  available  to  this 
study  for  each  of  the  North  Atlantic  Ocean  homogeneous 
areas,  it  was  necessary  to  withhold  one-third  of  the 
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observations  from  the  developmental  model  to  use  as  an  inde- 
pendent data  set.   This  was  accomplished  by  the  use  of  a 
counter  and  transfer  statement  in  the  computer  programs  which 
prevented  every  third  observation  from  entering  the  develop- 
mental computations.   To  ensure  that  the  dependent  and  inde- 
pendent data  were  representative  of  the  same  population,  a 
95%  confidence  interval  for  proportions  [Miller  and  Freund, 
19  77]  was  established  from  the  entire  data  set,  for  each 
visibility  category,  and  the  dependent  and  independent  data 
sets  were  constrained  to  have  visibility  frequencies  within 
these  established  confidence  intervals.   This  same  procedure 
was  applied  to  the  North  Pacific  Ocean  data  for  consistency  of 
method.   Table  II  summarizes  the  dependent  and  independent 
data  for  both  the  North  Atlantic  Ocean  and  North  Pacific 
Ocean  data  sets . 
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IV.   PRELIMINARY  EXPERIMENTS 

A.   TERMS  AND  SYMBOLS 

The  terms  and  statistical  symbols  defined  below  will  be 
used  throughout  the  remainder  of  this  report.   The  formal 
mathematical  definitions  can  be  found  in  Appendices  A  and 
E. 

1.  Maximum-probability  strategy — choosing  forecast 
visibility  categories  based  upon  the  highest  conditional 
probabilities  of  visibility  within  a  predictor  interval. 

2.  MAXPR0B1 — designation  of  the  maximum-probability 
strategy  in  which  ties  of  the  highest  conditional 
probabilities  in  a  predictor  interval  are  resolved  by 
the  generation  of  a  random  number. 

3.  MAXPR0B2 — designation  of  the  maximum-probability 
strategy  in  which  ties  of  the  highest  conditional 
probabilities  in  a  predictor  interval  are  resolved  by 
assigning  the  lowest  visibility  category,  of  those 
tied,  as  the  forecast  category. 

4.  Natural-regression  strategy — choosing  forecast  visi- 
bility categories  based  upon  the  statistical  average 
of  the  conditional  probabilities  of  visibility  within 
a  predictor  interval. 

5.  a~ — the  probability  of  a  zero-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is  fore- 
cast, it  is  also  observed) . 
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6.  a-.--the  probability  of  a  one-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is 
forecast  and  category  II  is  observed) . 

7.  a~--the  probability  of  a  two-class  visibility  category 
forecast  error  (e.g.,  if  visibility  category  I  is 
forecast  and  category  III  is  observed) . 

8.  CE — class  error  parameter  defined  as  a,  +  2a?/  used  to 
identify  the  first  predictor. 

9 .  PP--the  potential  predictability  of  visibility  by 
any  given  predictor. 

10.  FD--the  functional  dependence  of  one  predictor  on 
another.   This  is  a  measure  of  functional  dependence 
of  a  statistical  kind  and  not  of  the  deterministic 
kind.   The  term  'functional  dependence1  is  used  by 
Preisendorf er  (19  83c)  and,  being  sufficiently  descrip- 
tive of  the  concept,  it  will  be  used  herein. 

11.  RSS  FD--root  sum  squared  FD.   The  functional  dependence 
of  a  predictor  on  all  predictors  already  included  in 
the  developmental  model.   It  is  equal  to  the  square- 
root  of  the  sum  of  the  squares  of  the  individual  FD ' s . 

12.  TSl — threat  score  for  visibility  category  I  computed 
from  a  contingency  table. 

13.  ATS1 — adjusted  threat  score  for  visibility  category 

I  which  removes  the  influence  of  the  data  set  category 
frequency. 
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14.   AAO — adjusted  an  .   A  contingency  table  statistic 

which  removes  the  influence  of  the  most  frequent  visi- 
bility category  in  a  set  of  data  (similar  to  a  nor- 
malized value)  . 
15.   EPI — equally  populous  predictor  interval  used  to 
discretize  the  predictors. 

B.   COMPUTER  PROGRAMS 

Four  computer  programs  were  developed  to  test  the 
proposed  Preisendorfer  (1983  afb,c)  methodology.   The 
programs  are  on  file  in  the  Department  of  Meteorology,  Naval 
Postgraduate  School,  Monterey,  California,  9  39  43. 

1.  A  program  to  compute  a-,  a,,  CE  and  PP  for  all  predic- 
tors, all  strategies  (MAXPROBl,  MAXPR0B2  and  Natural- 
Regression)  and  a  single  number  of  EPI's.   Statistics 
for  the  three  strategies  are  based  upon  the  same  pre- 
dictor(s)  rather  than  the  best  predictor(s)  for  each 
strategy.   It  was  determined  during  program  development, 
and  will  be  shown  in  Chapter  VI,  that,  in  general,  each 
of  the  strategies  chose  the  same  predictor (s) . 

2.  A  program  to  compute  FD  for  all  predictors,  on  a  given 
predictor,  for  a  given  number  of  EPI's,  and  to  compute 
the  upper  5%  critical  value  (FD(96))  by  Monte-Carlo 
means  (Appendix  A) . 

3.  A  program  to  construct  contingency  tables  and  to  com- 
pute skill  and  threat  scores,  for  both  the  dependent 
and  independent  data. 
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4.   A  program  to  generate  100  random  data  sets,  from  the 
marginal  probabilities  of  the  predictor (s)  in  the 
developmental  model,  and  to  compute  upper  and  lower 
5%  critical  values  for  a   and  a,  to  be  used  for  test- 
ing the  significance  of  the  results  from  the  Preisen- 
dorfer  (19  83  a,b)  methodology  against  chance. 

C.   BEHAVIOR  OF  a   AND  THREAT  SCORES 

Before  attempting  a  formal  application  of  the  Preisen- 
dorfer  (19  83  a,b,c)  methodology,  it  was  considered  prudent 
to  investigate  the  behavior  of  certain  statistics  as  the 
number  of  equally  populous  predictor  intervals  was  changed 
and  as  new  predictors  were  added.   It  was  found,  during 
program  testing  and  before  a  formal  procedure  had  been  estab- 
lished, that  the  independent  data  threat  score  of  visibility 
category  I  (TSl)  generally  showed  higher  values  than  other 
threat  scores  (TS2,  TS12)  for  the  independent  data.   There- 
fore, it  was  decided  that  the  dependent  and  independent  data 
a~  and  TSl  scores  would  be  compared.   The  statistic  a~  was 
chosen  because  it  is  the  singularly  most  important  scoring 
parameter  in  the  Preisendorfer  methodology. 

The  experiment  consisted  of  choosing  the  first  predictor 
as  that  one  which  gave  the  highest  a-  value  when  divided 
into  ten  equally  populous  intervals.   Once  this  predictor 
was  chosen,  dependent  and  independent  data  a_  and  TSl  scores 
were  computed  for  each  number  of  intervals  as  the  number  was 
varied  from  two  to  100.   Prior  to  proceeding  to  the  next 
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step,  the  number  of  intervals  which  gave  the  highest  indepen- 
dent data  TS1  score  was  identified  and  the  first  predictor 
was  held  at  this  number  of  intervals  for  the  remainder  of 
the  experiment. 

Subsequent  predictors  were  chosen  by  both  a  maximum  a_ 
test  and  a  functional  dependence  test.   As  each  subsequent 
predictor  was  identified,  its  number  of  equally  populous 
intervals  was  varied  from  two  to  50  (or  less,  as  the  maximum 
array  size  was  set  at  120,000) .   The  number  of  equally  popu- 
lous intervals  giving  the  highest  independent  data  TSl  was 
identified  and  held  fixed  for  the  following  stage.   This  proce- 
dure was  repeated  until  either  six  predictors  were  used  or 
until  a  new  predictor  addition  did  not  allow  the  comparison 
of  at  least  intervals  two  through  ten,  due  to  computer 
storage  limitations.   It  should  be  noted  here  that  all  of 
the  North  Atlantic  Ocean  parameters,  not  including  linear- 
regression  equations,  were  used  in  these  experiments  and, 
subsequently,  some  parameters  were  removed  from  consideration 
(Appendix  D) . 

1.   Maximum  a^  Method 

The  first  NOGAPS  predictor  selected  was  SMF  which 
was  varied  from  two  to  100  EPI's  (Fig.  2a)  and  the  highest 
TSl  score  was  obtained  with  six  intervals.   The  second  pre- 
dictor chosen,  when  SMF  was  held  at  six  intervals  and  all 
others  at  ten,  was  DTDP  which  produced  the  highest  an  value 
for  two  predictors.   Holding  SMF  at  six  intervals,  DTDP  was 
varied  from  two  to  50  intervals  (Fig.  2b)  and  the  highest 
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TSl  score  was  obtained  at  20  intervals.   Anticipating  problems 
with  the  subsequent  array  size  with  respect  to  the  number  of 
predictors  which  could  be  included,  the  secondary  TSl  maximum 
at  16  intervals  was  used  for  further  stepping.   The  third  and 
subsequent  predictors  and  their  optimum  interval  sizes  were 
PS  at  12  (Fig.  2c) ,  UBLW  at  ten  (Fig.  2d)  and  V400  (Fig.  2e) . 
The  optimum  number  of  intervals  for  V40  0  was  not  germane  as 
no  further  stepping  was  done  after  this  step.   As  illustrated 
in  Fig.  2,  the  dependent  data  statistics  aymptotically  approach 
unity,  as  predictors  are  added,  while  the  independent  data 
statistics  (approximate  maximum  values:   a~  =  .70,  TSl  =  .35) 
show  no  further  increase  after  the  third  predictor  is  includd, 
which  may  imply  a  limit  as  to  how  well  the  methodology  per- 
forms on  this  particular  data  set. 

2 .   Functional  Dependence  Method 

As  functional  dependence  is  not  considered  until  after 
the  selection  of  the  first  NOGAPS  predictor,  Fig.  2a  is  also 
applicable  to  this  method.   Subsequent  predictors  were  chosen 
as  those  having  the  lowest  RSS  FD  using  ten  equally  populous 
intervals.   The  predictors  selected  and  their  optimum  inter- 
val sizes,  for  the  TSl  score,  were  RH  at  three  (Fig.  3a) , 
DUDP  at  four  (Fig.  3b)  ,  VOR9  25  at  two  (Fig.  3c)  ,  ENTRN  at 
14  (Fig.  3d)  and  UBLW  (Fig.  3e)  which  was  the  last  predictor 
considered.   As  seen  for  the  maximum  aQ method,  the  dependent 
data  statistics  asymptotically  approach  unity.   However  the 
independent  data  statistics  continue  to  grow  at  least  through 
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the  addition  of  the  sixth  predictor  (approximate  maximum 
values:   a.  =  .71,  TS1  =  .38) .   This  method  gave  better  results 
than  the  maximum  an  method,  though  it,  too,  may  imply  a 
limit.   The  results  of  this  experiment  also  tend  to  show  a 
preferential  selection  of  a  small  number  of  EPI's,  for  best 
independent  data  TSl  score,  as  well  as  indicating  that  func- 
tional dependence  is  a  relatively  good  choice  as  a  deciding 
factor  for  choosing  predictors. 

D.   BEHAVIOR  OF  FUNCTIONAL  DEPENDENCE 

Another  statistic  investigated  prior  to  the  formal 
application  of  the  Preisendorf er  (19  83  a,b,c)  methodology 
was  the  distribution  of  functional  dependence  (FD)  calculated 
from  100  randomly  generated  data  sets.   The  FD  calculation  is 
based  upon  the  relationship  of  the  distribution  of  one  pre- 
dictor to  another.   Because  the  predictors  are  divided  into 
the  same  number  of  EPI's  for  the  calculation,  the  probability 
of  a  randomly  generated  number  falling  into  any  given  inter- 
val for  either  predictor  will  be  the  same.   Therefore,  the 
randomly  generated  FD  values  should  be  a  function  only  of 
the  number  of  intervals  and  the  number  of  data  cases  (subse- 
quent randomly  generated  calculations,  during  the  formal 
application  of  the  methodology,  showed  this  to  be  true) . 

The  randomly  generated  FD  experiment  consisted  of  com- 
puting the  mean,  upper  and  lower  5%  critical  values,  and  the 
standard  deviation  of  the  100  randomly  generated  values  for 
both  1526  observations  (as  in  the  North  Atlantic  Ocean  Area 
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3W  dependent  data)  and  3682  observations  (as  in  the  North 
Pacific  Ocean  dependent  data)  and  a  comparison  of  the 
results.   As  illustrated  in  Fig.  4  the  FD  values  are  similar 
for  a  given  interval  size  differing  only  in  the  size  of  the 
confidence  interval  and  the  standard  deviation.   The  FD 
values  calculated  for  3682  observations  lie  totally  within 
the  upper  and  lower  5%  critical  values  for  1526  observations. 
Because  of  this  relationship,  future  FD(96)  values,  used  to 
qualitatively  determine  how  well  a  new  predictor  will  con- 
tribute to  the  developmental  model,  can  be  obtained  by  read- 
ing from  the  graph  rather  than  using  valuable  computer 
resources,;  providing  the  number  of  equally  populous  intervals 
is  less  than  or  equal  to  ten. 
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V.   PROCEDURES 

A.   PREISENDORFER  METHODOLOGY 

1.   Determination  of  the  First  Predictor  in  Relation 
to  the  Number  of  Predictor  Intervals 

A  matter  not  considered  in  Preisendorfer  (19  83  a,b,c) 
is  how  to  chose  an  optimum  number  of  equally  populous  pre- 
dictor intervals  (EPI's)  into  which  predictor  data  should 
be  divided.   During  the  course  of  development,  two  important 
realizations  became  evident,  namely,  (a)  there  is  a  tendency 
for  the  methodology  to  give  better  results  using  a  small 
number  of  intervals,  and  (b)  the  NPS  W.R.  Church  Computer 
Center  limits  internal  computer  storage  space  to  two  mega- 
bytes for  routine  programs.   The  first  suggested,  while  the 
second  forced,  the  research  to  be  limited  to  EPI's  of  less 
than  or  equal  to  ten  if  more  than  three  or  four  predictors 
were  to  be  considered.   Once  this  was  established,  a  proce- 
dure was  developed  to  look  at  all  EPI's  within  the  stated 
limit. 

The  procedure  involves  computing  the  initial  statis- 
tics (aQ/  a,,  CE  and  PP)  for  each  predictor,  for  each  strategy 
(maximum-probability  and  natural-regression)  and  for  EPI's 
of  two  through  ten.   Then,  the  best  first  predictor  for  each 
number  of  EPI's  is  determined,  for  each  strategy,  by  meeting 
one  or  both  of  the  following  conditions,  when  considered  in 
the  indicated  order: 
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a.  lowest  CE 

b.  highest  PP 

Once  the  best  predictor  for  each  number  of  EPI's  is 
known,  it  is  then  necessary  to  determine  the  optimum  number 
of  EPI's.   This  is  accomplished  by  computing  threat  and  skill 
scores  (Appendix  E)  for  both  the  dependent  and  independent 
data  and  choosing,  as  the  optimum  number  of  EPI's,  that  which 
gives  both  a  relatively  high  adjusted  an  (AAO)  for  the  depen- 
dent data  and  a  relatively  high  adjusted  threat  score  for 
visibility  category  I  (ATS1)  for  the  independent  data.   This 
becomes  a  somewhat  subjective  endeavor  and  remains  as  the 
only  imprecise  step  in  the  methodology. 

The  statistic  ATSl  is  used  on  the  independent  data, 
instead  of  a~ ,  because  it  is  the  poor  visibility  categories 
(I  and  II)  that  are  of  primary  forecast  interest  and  their 
forecastability  is  manifested  in  their  threat  scores.   It 
will  be  shown  that,  in  general,  the  adjusted  threat  score 
for  visibility  category  II  (ATS2)  and  for  combined  visibility 
categories  I  and  II  (ATS12)  are  small  compared  to  ATSl,  or 
negative,  and  that  ATS12  is  maximized  when  ATSl  is  maximized. 
Additionally,  it  will  be  shown  that  maximum  an  does  not 
necessarily  coincide  with  maximum  ATSl  in  the  independent 
data.   Hence,  if  an  was  used,  the  optimum  combination  of 
predictors  necessary  to  forecast  the  poor  visibility  cate- 
gories would  not  be  included. 

Once  the  number  of  EPI's  is  established,  it  is  fixed 
for  all  subsequent  predictors  considered  for  the  developmental 
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model.   Holding  the  number  of  intervals  fixed  is  not  an 
absolute  necessity,  however  it  allows  for  a  much  more  rapid 
development  of  the  model.   Once  this  number  is  determined  for 
the  first  predictor,  it  is  used  to  calculate  FD  for  the  next 
predictor  because  FD  is  calculated  using  the  established 
number  of  EPI's.   The  next  stage  statistics  (an/  a,,  CE  and 
PP)  are  also  computed  with  each  predictor  divided  into  this 
same  number  of  EPI's. 

2 .  Choosing  the  Second  Predictor 

The  second  predictor  to  be  included  in  the  model  is 
determined  from  its  FD  on  the  first  predictor  and  from  the 
increase  in  a_  resulting  from  its  inclusion.   This  is  accom- 
plished by  computing  an  with  two  predictors,  namely,  the 
first  predictor,  as  determined  above,  with  each  of  the 
remaining  predictors.   Those  predictors  which  do  not  increase 
a_  above  its  value  as  determined  with  the  first  predictor 
alone,  are  removed  from  further  consideration  for  inclusion 
into  the  set  of  predictors  in  the  developmental  model.   FD 
for  each  of  the  remaining  predictors  vs .  the  first  predictor 
is  computed.   The  remaining  predictor  with  the  lowest  FD, 
on  the  first  predictor,  is  chosen  as  the  second  predictor  in 
th  e  mo  de  1 . 

3.  Choosing  Subsequent  Predictors 

Subsequent  predictor  determination  is  similar  to  the 
second  predictor  determination.   Compute  an  with  N  predictors 
(N  =  1,...,M+1;  M  =  the  number  of  predictors  already  in  the 
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developmental  model),  that  is,  the  first  through  Mth  pre- 
dictors, as  previously  determined,  and  each  of  the  remaining 
predictors.   Those  predictors  which  do  not  increase  a„  above 
its  value  as  determined  with  M  predictors  are  removed  from 
further  consideration.   RSS  FD  is  computed  for  each  of  the 
remaining  predictors  and  the  one  with  the  lowest  RSS  FD  is 
chosen  as  the  Nth  predictor  in  the  model. 

4 .  Significance  Tests 

After  each  stage  (i.e.,  after  each  new  predictor  to 
be  included  in  the  developmental  model  is  determined)  it  is 
necessary  to  determine  if  the  results  are  significant.   This 
is  accomplished  by  Monte-Carlo  means  using  the  data  set 
marginal  probabilities  of  the  predictors  and  assuming  equal 
probability  of  occurrence  for  visibility  categories  (Appen- 
dix A) .   The  statistics  an  and  a, • are  computed  for  each  of 
100  randomly  generated  data  sets  of  a  size  equal  to  the 
number  of  observations  in  the  dependent  data  set  being  tested, 
and  sorted  from  lowest  to  highest.   The  96th  value  of  an 
(aQ(96))  and  the  fifth  value  of  a,  (a,  (05) )  are  retained  as 
the  upper  and  lower  5%  critical  values.   For  developmental 
model  results  to  be  significantly  better  than  chance,  a_ 
must  be  greater  than  or  equal  to  an(96)  and  a,  must  be  less 
than  or  equal  to  a, (0  5) . 

5 .  Terminating  the  Selection  of  Predictors 

Model  development  continues  until  any  one  of  four 
conditions  are  met: 
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a.  no  more  predictors  remain  to  be  considered. 

b.  results  are  no  longer  significant. 

c.  required  computer  region  size  exceeds  that  which  is 
allowed  (two  megabytes  at  the  NPS  W.R.  Church  Computer 
Center) . 

d.  independent  data  ATSl  does  not  increase  for  two 
consecutive  predictor  additions.   (It  will  be  shown 
that  there  is  a  point  in  the  development  of  the  model 
where  the  skill  and  threat  scores  for  the  dependent 
data  diverge  sharply  from  those  for  the  independent 
data.   This  condition  for  terminating  model  development 
is  a  subjective  attempt  at  taking  this  point  into 
consideration. ) 

Once  the  model  development  is  complete,  contingency 
tables  of  forecast  visibility  categories  vs.  observed  visi- 
bility categories,  for  both  the  dependent  and  independent 
data,  are  constructed.   From  the  contingency  tables,  threat 
and  skill  scores  for  both  data  sets  are  computed  and  compared. 

B.   COMPARISON  METHODOLOGY 

The  results  obtained  from  the  Preisendorfer  (19  83  a,b,c) 
methodology  were  compared  to  two  variations  of  a  linear, 
least-squares  regression  model.   The  model  chosen  for  the 
comparison  is  that  available  in  the  BMDP  Statistical  Software 
(namely  BMDP2R)  [University  of  California,  19  81]  using  two 
new  threshold  schemes  developed  by  Lowe  (19  84c)  (Appendix  B) . 
The  equations  developed  by  BMDP2R  include  all  predictors  which 
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increased  R-squared  (the  proportion  of  the  predictand  vari- 
ance explained  by  the  estimation  of  the  predictand  from  the 
multiple  regression  equation)  by  at  least  1%.   An  excellent 
description  of  this  procedure  is  given  by  Best  and  Pryor 
(19  83),  with  R-squared  being  equivalent  to  their  R- value. 

1.  Method  1 

The  first  linear  regression  method  consists  of 
generating  a  single  equation,  trained  on  the  dependent  data, 
with  the  predictand  set  equal  to  1,  2  or  3,  corresponding  to 
visibility  categories  I,  II  and  III,  respectively.   This 
equation  is  used -to  determine  threshold  values  (Appendix  B) 
and  is  then  applied  to  the  independent  data. 

2.  Method  2 

The  second  linear  regression  method  is  based  on  a 
decision-tree  scheme  using  two  linear-regression  equations 
trained  on  the  dependent  data.   The  first  equation  is 
generated  with  the  predictand  values  set  equal  to  zero  or 
one,  corresponding  to  combined  visibility  categories  I  and 
II  (0)  and  visibility  category  III  (1).   The  second  equation 
is  generated  with  the  predictand  set  equal  to  zero  or  one, 
corresponding  to  visibility  category  I  (0)  and  visibility 
category  II  (1).   Visibility  category  III  observations  are 
ignored  during  this  linear  regression.   Threshold  values  are 
then  computed  for  each  equation. 

When  both  equations  and  their  associated  threshold 
values  are  known,  the  independent  data  set  is  sorted  into 
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visibility  category  III  and  visibility  category  'other'  by 
the  first  equation,  and  the  'other'  category  is  sorted  into 
visibility  categories  I  and  II  by  the  second  equation. 
Following  the  development  of  linear  regression  method  1  and 
method  2,  contingency  tables  are  constructed,  skill  and 
threat  scores  computed,  and  comparisons  made  with  the  results 
from  the  Preisendorfer  (19  83  a,b,c)  methodology. 
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VI.   RESULTS 

A.   NORTH  PACIFIC  OCEAN 

1.  First-Predictor  Selection  and  Interval  Determination 
The  first  predictor  selected,  for  equally  populous 

intervals  (EPI's)  of  four  through  ten  was  EHF  (Table  III). 
The  constant  value  for  a, ,  maximum-probability  strategy, 
indicates  that  there  is  no  predictability  for  visibility 
category  II  (the  least  frequent  category  in  the  data  set) 
using  a  single  predictor.   A  comparison  of  the  dependent 
data  adjusted  a„  (AAO)  and  independent  data  adjusted  threat 
score  for  visibility  category  I  (ATSl)  subjectively  deter- 
mined the  selection  of  five  EPI's  for  the  developmental 
model  (Table  IV;  Fig.  5) . 

2 .  Selecting  Subsequent  Predictors 

Once  the  number  of  intervals  and  first  predictor 
were  known,  a  new  a_  computation  was  made  with  the  first 
predictor  and  each  of  the  remaining  predictors.   Only  six  of 
the  remaining  21  predictors,  CLIMO ,  SEHF,  THF,  DDWW,  H510 
and  RH ,  in  combination  with  EHF,  gave  new  an  values  greater 
than  that  for  EHF  alone  (.69  7);  these  comprised  the  pool  of 
predictors  to  be  considered  for  further  development  of  the 
model.   Functional  dependence  (FD)  with  EHF  was  computed  for 
each  of  these  six  predictors  and  DDWW  was  chosen  as  the  second 
predictor  because  it  had  the  lowest  FD. 
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For  the  determination  of  the  third  through  sixth 
predictors,  a  new  an  was  computed  as  a  function  of  all  of 
the  previously  selected  predictors  and  each  of  the  remaining 
predictors.   At  each  stage,  the  new  a   computation  for  each 
remaining  predictor  was  greater  than  that  for  the  prior 
stage,  so  no  further  predictors  were  eliminated  from  con- 
sideration.  FD  was  then  computed,  for  each  of  the  predictors 
being  considered  with  each  of  the  predictors  previously 
selected,  and  RSS  FD  determined.   At  any  given  stage  (three 
through  six)  the  new  predictor  added  to  the  developmental 
model  was  that  one  with  the  lowest  RSS  FD.   The  third  through 
sixth  predictors,  in  order  of  selection,  are  H510,  RH ,  THF 
and  CLIMO  (Table  V)  . 

3 .  Determining  the  Final  Model 

The  final  model  for  the  Preisendorfer  (1983  a,b,c) 
methodology  was  determined  by  comparing  the  independent  data 
contingency  table  statistics,  from  each  developmental  stage, 
and  choosing  the  fourth  stage  because  it  gave  the  highest 
adjusted  threat  score  for  visibility  category  I  (ATS1). 
(Fig.  6) .   The  contingency  tables  for  stage  four  and  the 
related  statistics  for  the  three  strategies  are  shown  in  Table 
VI. 

4 .  Linear  Regression 

A  single  linear-regression  equation  was  developed 
from  the  North  Pacific  Ocean  data  using  method  1.   Both  the 
quadratic  and  equal-variance  threshold  models  (Appendix  B) 
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were  applied  but  only  the  threshold  values  from  the  equal- 
variance  model  were  used  to  compare  methodologies.   Table 
VII  contains  the  linear  regression  equation,  the  visibility 
category  linear  regression  statistics  and  the  threshold 
values.   Contingency  tables  and  related  statistics  for  the 
dependent  and  independent  data  are  shown  in  Table  VIII. 
5.   Discussion 

The  best  results  obtained  from  the  North  Pacific 
Ocean  data  were  from  the  Preisendorfer  (19  83  a,b,c)  methodology, 
MAXPR0B2  strategy,  as  it  has  the  highest  independent  data 
adjusted  threat  scores  for  visibility  categories  I  and  com- 
bined I/II  (ATS1  =  .20,  ATS12  =  -.05).   Each  of  the  maximum- 
probability  strategies  (MAXPROBl:   ATSl  =  .17,  ATS12  =  -.10) 
did  better  than  linear  regression  (ATSl  =  .16,  ATS12  =  -.13), 
while  natural- regression  shows  the  poorest  skill  (ATSl  =  -.02, 
ATS12  =  -.19) . 

It  appears,  from  Fig.  6,  that  most  of  the  usable 
forecastability  resides  in  the  first  predictor  chosen.   This 
would  indicate  that  it  may  be  profitable  to  search  for 
better  predictors  by  combining  model  output  parameters, 
conducting  dimensional  analysis  or  using  linear-regression 
equation  estimates  as  predictors  as  was  done  in  the  North 
Atlantic  Ocean  experiments  which  follow. 

B.   NORTH  ATLANTIC  OCEAN  AREA  3W 

Based  upon  the  results  obtained  in  the  North  Pacific 
Ocean,  it  was  decided  to  use  the  linear  regression  model  to 
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generate  equations  which  could  be  used  as  predictors.   Seven 
such  equations  were  developed,  each  representing  a  different 
menu  of  parameters  available  to  the  regression  model.   The 
seven  equations  are  included  in  Appendix  D.   The  Preisen- 
dorfer  (1983  a,b,c)  methodology  then  proceeded  both  with 
and  without  these  linear-regression  equations  available  as 
predictors . 

1 .   First  Predictor  Selection  and  Interval  Determination 

a.  Without  Linear-Regression  Equations  as  Predictors 
The  first  predictor,  for  EPI's  of  four  through 

ten,  varied  with  the  number  of  intervals  (Table  IX) .   A 
comparison  of  the  dependent  data  AAO  and  the  independent 
data  ATS1  determined  the  selection  of  eight  EPI's  for  the 
model  (Table  X)  and,  therefore,  SMF  as  the  first  predictor. 
However,  through  investigator  error,  the  model  was  initially 
developed  with  five  EPI's  and  E9  25  as  the  first  predictor. 
Therefore,  both  results  will  be  presented. 

b.  With  Linear-Regression  Equations  as  Predictors 
The  first  predictor  for  each  EPI  of  four  through 

ten  is  BMl ,  the  predictand  estimate  computed  by  the  linear 
regression  equation  developed  when  all  of  the  predictors 
were  available  to  the  regression  model  (Table  XI} .   Two  of 
the  EPI's,  namely  four  and  eight,  have  identical,  and  best, 
dependent  data  AAO  and  independent  data  ATSl  scores  (Table 
XII,  Fig.  7),  so  it  was  decided  to  proceed  with  the  develop- 
mental model  for  both  intervals. 
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2 .  Selecting  Subsequent  Predictors 

Subsequent  predictors  were  chosen  in  the  same  way  as 
described  in  the  procedures  and  for  the  North  Pacific  Ocean 
experiment.   The  predictors,  not  including  linear  regression 
equations  as  predictors,  are  SMF,  D8  50,  RH,  UBLW  and  ENTRN 
for  eight  EPI's  (Table  XIII)  and  E925,  U700 ,  DVDP,  STRTFQ, 
ENTRN  and  PS  for  five  EPI's  (Table  XIV) .   The  predictors, 
including  linear  regression  equations  as  predictors,  are 
BM1,  U850,  D500,  V850,  D1000  and  U1000  for  four  intervals 
(Table  XV)  and  BMl ,  U500,  ENTRN,  DVDP  and  BM4  for  eight 
intervals  (Table  XVI) .   Significance  tests  were  made  after 
each  predictor  selection  and  a  (96)  and  a,  (05)  values  are 
included  in  Tables  XIII,  XV  and  XVI.   A  comparison  of  the 
behavior  of  critical  level  statistics,  as  predictors  are 
added,  for  both  four  and  eight  intervals,  is  shown  in  Figs. 
8  and  9,  where  array  size  is  equal  to  the  number  of  EPI's 
taken  to  a  power  equal  to  the  number  of  predictors  included 
at  that  stage. 

3 .  Determining  the  Final  Model 

The  final  model  for  the  Preisendorf er  (1983  a,b,c) 

methodology  was  determined  by  comparing  the  independent  data 

contingency  table  statistics,  from  each  developmental  stage, 

and  choosing  that  stage  which  gave  the  highest  adjusted 

threat  score  for  visibility  category  I  (ATS1) . 

a.   Without  Linear  Regression  Equations  as 
Predictors  (Eight  Intervals) 

It  was  determined,  from  Fig.  10,  that  the  fifth 

stage  gave  the  best  results  (MAXPROB1,  independent  data: 
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ATS1  =  .19,  ATS2  =  .03,  ATS12  =  -.05) .  The  contingency  tables 
for  stage  five  and  related  statistics  for  the  three  strategies 
are  shown  in  Table  XVII. 

b.  Without  Linear  Regression  Equations  as 
Predictors  (Five  Intervals) 

It  was  determined,  from  Fig.  11,  that  the  fifth 

stage  gave  the  best  results  (MAXPROB2,  independent  data: 

ATS1  =  .25,  ATS2  =  .02,  ATS12  =  .01) .   The  contingency  tables 

for  stage  five  and  related  statistics  for  the  three  strategies 

are  shown  in  Table  XVIII. 

c.  With  Linear  Regression  Equations  as 
Predictors  (Four  Intervals) 

It  was  determined,  from  Fig.  12,  that  the  fourth 

stage  gave  the  best  results  (MAXPROB2,  independent  data: 

ATS1  =  .40,  ATS2  =  -.05,  ATS12  =  .12) .   The  contingency  tables 

for  stage  four  and  related  statistics  for  the  three  strategies 

are  shown  in  Table  XIX. 

d.  With  Linear  Regression  Equations  as 
Predictors  (Eight  Intervals) 

It  was  determined,  from  Fig.  13,  that  the  second 

stage  gave  the  best  results  (MAXPROB2,  independent  data: 

ATS1  =  .32,  ATS2  =  -.14,  ATS12  =  .02) .   The  contingency  tables 

for  stage  two  and  related  statistics  for  the  three  strategies 

are  shown  in  Table  XX. 

4 .   Linear  Regression 

Both  linear  regression  methods  (single  equation  and 

decision  tree)  and  both  threshold  models  (quadratic  and 

equal  variance)  [Lowe,  19  84a]  were  used  to  compare  with  the 
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Preisendorfer  (19  83  a,b,c)  methodology  in  the  North  Atlantic 
Ocean  Area  3W.   Additionally,  the  predictors  available  for 
regression  were  varied  as  indicated  in  the  following  descrip- 
tion.  The  first  regression  was  conducted  with  all  available 
MOP's  while  the  second  regression  was  conducted  using  only 
the  best  predictors  from  the  Preisendorfer  methodology  (de- 
fined as  those  predictors  which,  alone,  produced  an  a_  value 
greater  than  the  frequency  of  visibility  category  III  in  the 
dependent  data) .   Table  XXI  contains  the  linear-regression 
equations,  associated  visibility  category  statistics  and 
threshold  values.   Tables  XXII  through  XXVII  contain  the 
contingency  tables  and  related  statistics  for  the  dependent 
and  independent  data  for  each  of  the  linear  regression 
variations . 

5 .   Discussion 

Table  XXVIII  summarizes  each  of  the  methodologies  and 
strategies  applied  to  the  North  Atlantic  Ocean  Area  3W 
data.   In  general,  the  maximum-probability  strategy  did 
better  than  the  other  methods  or  strategies.   Specifically, 
the  best  results  overall  were  obtained  by  the  MAXPR0B2 
strategy,  using  predictors  computed  from  linear  regression 
equations  and  four  equally  populous  intervals.   The  methodology 
without  linear  regression  equations  as  predictors,  and  all 
of  the  linear  regression  results,  are  about  equivalent.   The 
best  linear  regression  method  is  the  decision  tree,  when  all 
MOP's  are  made  available  to  the  regression  model.   The  results 
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obtained  without  linear  regression  equations  as  predictors 
appear  to  discount  the  procedure  established  for  choosing  the 
number  of  equally  populous  predictor  intervals,  but  lends 
support  to  the  claim  in  Chapter  V  that  there  is  a  tendency 
for  the  Preisendorfer  (19  83  a,b,c)  methodology  to  give  better 
results  using  a  small  number  of  intervals. 
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VII.   CONCLUSIONS  AND  RECOMMENDATIONS 

The  primary  objective  of  this  study  was  to  determine 
if  the  Preisendorfer  (19  83  a7b,c)  methodology  applied  to  the 
FNOC  NOGAPS  model  output  parameters  could  improve  upon  the 
forecasting  of  atmospheric  marine  horizontal  visibility,  in 
three  categories,  when  compared  to  the  more  traditional 
method  of  least  squares,  multiple  linear  regression.   It  was 
shown  that,  indeed,  the  proposed  methodology,  namely,  the 
maximum  probability  strategy,  was  superior  when  predictand 
estimates,  computed  from  linear  regression  equations 
themselves,  were  used  as  predictors. 

The  method  of  determining  the  number  of  equally  populous 
predictor  intervals  requires  further  investigation.   The 
results  from  the  North  Atlantic  Ocean  area  3W,  without 
linear  regression  equations  as  predictors,  showed  that  the 
proposed  method  was  not  the  best,  in  that  the  number  of  inter- 
vals determined  by  the  method  was  eight  but  better  results 
were  obtained  with  five.   Additionally,  only  intervals  of 
ten  or  less  were  considered  here,  due  to  storage  limitations 
imposed  by  the  computer  center.   As  a  result,  the  optimum 
number  of  predictor  intervals  is  inconclusive. 

Predictor  determination  appears  to  be  adequate.   At  each 
stage  of  development  a  unique  predictor  was  selected.   The 
only  foreseeable  problem  is  if,  during  the  first  (initial) 
stage  of  development,  multiple  predictors  have  identical  CE 
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and  PP  values,  or,  during  subsequent  stages,  multiple  pre- 
dictors have  identical  a_  and  FD  values.   Should  this  occur, 
the  model  development  would  have  to  proceed,  from  that 
particular  stage,  with  each  of  the  identified  predictors. 

The  methodology  appears  to  be  sensitive,  in  two  ways,  to 
the  first  predictor  selected.   First,  there  is  an  initial 
large  value  for  the  independent  data  ATSl  and  small  incre- 
mental increases  thereafter  for  each  new  predictor  added. 
Secondly,  there  is  a  large  magnitude  difference  in  the 
initial  independent  data  ATSl  values  between  the  Preisen- 
dorfer  methodology  without  linear  regression  equations  as 
predictors  (ATSl  =  .13;  .14)  and  that  with  linear  regression 
equations  as  predictors  (ATSl  =  .30),  for  the  maximum 
probability  strategy. 

The  best  strategy  is  MAXPR0B2,  followed  by  MAXPROB1,  and 
then  natural-regression.   Generally,  natural-regression  does 
worse  than  linear  regression.   None  of  the  methods  did  well 
in  predicting  visibility  category  II,  which  may  indicate 
that  visibility  would  be  best  handled  as  a  two-category 
phenomenon . 

The  number  of  independent  data  observations  (1526)  in 
North  Atlantic  Ocean  Area  3W  were  sufficient  to  test  the 
methodology.   This  was  demonstrated  by  the  similar  results 
between  Area  3W,  without  linear  regression  equations  as 
predictors,  and  the  North  Pacific  Ocean  results  (3682 
observations) .   The  small  differences  in  the  contingency 
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table  statistics  for  the  independent  data  for  the  two  experi- 
ments can  be  attributed  to  parameters  being  from  different 
models  and  for  different  months. 

The  following  recommendations  are  offered  for  future 
research  and  to  future  researchers: 

1.  Investigate  the  problem  of  determining  the  optimum 
number  of  equally  populous  predictor  intervals. 
Possibly,  a  statistic  similar  to  the  threat  scores 
or  adjusted  threat  scores  could  be  used,  or,  simply 
choose  the  interval,  between  two  and  ten,  which  gives 
the  highest  adjusted  threat  scores  for  the  independent 
data.   Alternatively,  adopt,  without  further  experimen- 
tation, the  number  of  EPI's  as  five,  which  appears  to 
be  a  compromise  between  a  gross  resolution  of  the 
predictor  parameter  range  and  a  fine  (but  too  expensive) 
resolution  of  the  predictor  parameter  range. 

2.  Investigate  the  use  of  potential  predictability  (PP) 
in  determining  the  selection  of  predictors.   During 
the  initial  stage  of  development,  PP  is  computed  for 
all  available  predictors  and  provides  a  measure  of 
each  predictor's  individual  ability  to  forecast 
visibility,  but,  it  is  not  used  explicitly.   Perhaps 
computing  the  mean  and  standard  deviation  of  PP , 
during  the  initial  stage,  and  removing  from  considera- 
tion those  predictors  which  are  not  greater  than  a 
value  equal  to  the  mean  minus  one  standard  deviation, 
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or,  simply,  not  greater  than  the  mean.   This  would 
ensure  that  only  those  predictors  which  have  a  rela- 
tively high  prospect  of  forecasting  visibility  will 
be  available  for  subsequent  selection. 

3.  Search  for  better  predictors  which  are  particularly 
suited  to  visibility  prediction.   Recommended  sources 
are:   new,  direct  and  derived,  model  output  parameters 
(including  original  model  output);  non-dimensional 
parameters  derived  from  dimensional  analysis;  and 

boundary-layer  parameters  such  as  the  optical  structure 

2 

function  (C  )  and  extinction  coefficients. 

4.  Investigate  a  two-category  visibility  scheme. 

5.  Install  automatic  visibility  recorders  on  ocean-going 
military  and  civilian  passenger/cargo  ships.   This 
will  place  visibility  observations  on  a  more  objective 
basis  and  lead  to  improved  methods  of  forecasting 
visibility,  as  well  as  verifying  such  forecasts. 

6.  Investigate  new  prediction  models,  preferably  those 
which  attempt  to  manipulate  the  observed  data  to 
correct  for  probable  observer  bias  (following  Selsor, 
19  80;  Renard  and  Thompson,  19  84) .   This  would  be 
unnecessary  if  recommendation  5  was  acted  upon. 

7.  Investigate  other  ocean  areas  and  seasons  to  determine 
if  the  physically  homogeneous  area  scheme  is  consistent 
and  viable.   Develop  prediction  tables  and  other  aids 
specifically  tailored  to  region  and  season. 
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8.  Use  a  statistic  other  than  ATSl  for  choosing  the 
first  predictor  and  for  comparing  methods  and  strate- 
gies.  It  was  used  in  this  study  largely  because  of 
its  greater  magnitude,  as  compared  to  ATS2  and  ATS12. 
This  was  due  to  the  relatively  high  frequency  of  visi- 
bility category  I  in  both  data  sets.   In  general,  this 
will  not  be  the  case.   Because  three  visibility  cate- 
gories are  being  considered,  and  good  forecasts  of 
the  two  poorest  visibility  categories  is  desirable,  a 
statistic  such  as  ATS12  would  be  better  suited  as  a 
consistent  comparison  statistic  for  future  researchers. 

9.  As  soon  as  it  is  feasible,  eliminate  from  further 
testing  the  MAXPROBl  strategy  in  order  to  allow  for 
more  efficient  and  faster  program  execution.   The 
natural-regression  strategy,  though  it  gave  the  poorest 
results  in  this  study,  should  be  re-examined  when 
predictands  with  relatively  many  discrete  states 
(e.g.,  ceiling)  are  considered.   It  has,  in  such 
settings,  potential  to  out  perform  the  more  rigid 
linear  regression  technique. 
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APPENDIX  A 

A  DISCUSSION  OF  THE  STATISTICAL  PROCEDURES  PROPOSED  BY 
PREISENDORFER  (1983  a,b,c)  FOR  THE  FORECASTING  OF 
ATMOSPHERIC  MARINE  HORIZONTAL  VISIBILITY  USING 
MODEL  OUTPUT  STATISTICS 


I.   INTRODUCTION 

The  following  discussion  is  based  upon  three  unpublished 
research  papers  by  Preisendorfer  (1983  a,b,c).   His  proposed 
methodology  deals  with  a  simple  statistical  manipulation  of 
model  output  parameters  (predictors)  which  have  been  trans- 
formed from  continuous  to  discrete  quantities  by  grouping 
each  predictor  into  equally  populous  intervals.   The  proce- 
dural approach  in  applying  his  methodology  to  model  output 
statistics  (MOS)  forecasting,  is  as  follows: 

1.  Generate  predictand/predictor  pairs  of  data  using  the 
United  States  Navy  Fleet  Numerical  Oceanography  Center 
Navy  Operational  Global  Atmospheric  Prediction  System 
(NOGAPS)  model  output  (predictors)  and  synoptic  ship 
visibility  observations  (predictand)  provided  by  the 
Naval  Oceanography  Command  Detachment,  Asheville,  NC, 
and  generate  bivariate  plots. 

2.  Generate  conditional  probability  tables  based  on  the 
distribution  of  the  predictand/predictor  pairs. 

3.  Define  prediction  strategies  based  on  the  conditional 
probabilities . 
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4.  Compute  the  potential  predictability  of  visibility 
from  the  conditional  probability  tables. 

5.  Compute  skill  scores  of  the  prediction  strategies  and 
choose  the  first  predictor. 

6.  Repeat  steps  1,  2,  4,  and  5,  for  multiple  predictors. 

7.  Compute  functional  dependence  of  selected  vs.  potential 
subsequent  predictors. 

8.  Choose  the  next  predictor. 

9.  Repeat  steps  1,  2,  4,  5,  7,  and  8,  until  model 
development  is  terminated. 

For  demonstration  purposes,  an  artificial  data  set  of 
99  cases,  consisting  of  four  predictors  plus  visibility 
(predictand) ,  will  be  used  throughout  this  discussion. 
Each  predictor  parameter  is  divided  into  three  equally  popu- 
lous intervals  and  visibility  is  divided  into  three  categories, 
as  illustrated  in  Table  Al .   The  four  predictors  are 
Evaporative  Heat  Flux  (EHF) ,  Fog  Probability  Parameter 
(FTER) ,  Relative  Humidity  (RH)  and  Air-Sea  Temperature 
Difference  (ASTD) .   Visibility  categories  are  defined  by  the 
marine  visibility  observation  codes  (MVOC)  included  in  the 
categories. 
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TABLE  Al 
ARTIFICIAL  DATA  SET 

Interval  1           Interval  2  Interval  3 

EHF  <_   2.65         2.65  <  EHF  <  4.44  EHF  >  4.44 

FTER  <  .024          .024  <  FTER  £  .9  FTER  >  .9 

RH  <  85.9         85.9  <  RH  <_  90.0  RH  >  90.0 

ASTD  £  1.02         1.02  <  ASTD  £  1.91  ASTD  >  1.91 

Visibility  Category  I:     MVOC  90  ->  94  (60  cases) 
Visibility  Category  II:    MVOC  9  5  &  9  6  (20  cases) 

Visibility  Category  III:   MVOC  97  ->  99  (19  cases) 


II.   SINGLE  PREDICTOR  STATISTICS 

A.   BIVARIATE  PAIRS 

Choose  various  visibility-predictor  pairs  and  make 
bivariate  plots  of  these  pairs.   This  will  provide  immediate 
visual  estimation  of  the  potential  predictability.   As  an 
example,  let  us  suppose  that  predictor  EHF  of  our  artificial 
data  set  has  33  cases  in  each  equally  populous  interval  and 
that  the  visibility  categories  I,  II  and  III  are  respectively 
represented  by  17,  7  and  9  in  interval  1;  1,  1   and  25  in 
interval  2;  1,  6  and  26  in  interval  3.   To  make  the  bivariate 
plot,  simply  make  a  tabular  summary  of  this  information,  as 
illustrated  in  Fig.  14.   Now  we  define,  from  the  bivariate 
plot,  our  coordinate  system  and  nomenclature.   Items  in 
parentheses  are  examples  from  Fig.  14,  numbers  in  brackets 
are  equation  numbers  from  Preisendorf er  (19  83  a,b,c)  with 


53 


a  letter  designator  indicating  the  paper  from  which  it  was 
obtained. 

n   =   number  of  visibility  categories  (n  =  3) 

m   =   number  of  equally  populous  predictor  intervals 
(m  =  3) 

j   =   the  vertical  counting  index  (j  =  l,...,n) 

i   =   the  horizontal  counting  index  (i  =  l,...,m) 

n(i,j)   =   individual  cell  counts  (n(l,3)  =  9) 

m 
n(.,j)   =  marginal  predictand  totals  =   £   n(i,j)  = 

i=l 
row  totals  (n(.,2)  =  20)  [3.1a] 

n 
n(i,.)   =  marginal  predictor  totals  =   £   n(i,j)  = 

j-l 

column  totals  (n(2,.)  =  33)   [3.2a] 

n(.,.)   =   total  predictand/predictor  pairs  = 
n   m 

£   n(i,j)  =  sum  over  all  cells  (n(./.)  =  99) 
j-l  i-1 
[3.3a] 

B.   CONDITIONAL  PROBABILITIES 

From  the  bivariate  pairs  determine  the  conditional  proba- 
bility of  visibility  given  a  predictor.   We  will  continue  from 
the  bivariate  plot  in  Fig.  14,  and  define  three  probabilities: 


P-,2(i,j)   =   n(i,  j) /n( . ,  .)   =   joint  probability  of  a 
predictand-predictor  pair  occurring  in  a 
given  cell   =   individual  cell  count 
divided  by  the  total  number  of  cases 
(p  2(3,3)   =   26/99   =   .2626)   [3.5a] 
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p, (i)   =   n(i, .)/n( . , . )   =  marginal  probability  of 
predictor   =   column  total  divided  by  the 
total  number  of  cases   =   the  column  sum  of 
the  joint  probabilities 
(p1(2)   =   33/99   =   .333)   [3.6a] 

p2(j)   =   n( . , j)/n( . , .)   =   marginal  probability  of 
predictand   =   row  total  divided  by  the 
total  number  of  cases   =   the  row  sum  of  the 
joint  probabilities   (p9(2)   =   20/99   =   .202) 
[3.7a]  l 


We  can  now  build  a  joint/marginal  probability  table  as 
illustrated  in  Fig.  15,  and  define  conditional  probability 


>21(j|i)   =   p12  (i,  j)/p1  (i)   =   n(i,  j)/n(i,  .) 

conditional  probability  of  predictand  given 
a  predictor   =   a  cell's  joint  probability 
divided  by  the  marginal  probability  of- 
predictor   =   individual  cell  count  divided 
by  column  total 
(p   (2|2)   =   .071/. 333   =   7/33   =   .212) 

[3.8a] 


Now  build  a  conditional  probability  table  as  illustrated 
in  Fig.  16.   Conditional  probability  of  visibility,  given 
some  predictor,  is  the  quantity  of  greatest  interest  in  this 
study.   Note  that  if  p?.(j|i)   =   1/n   for   j  =  l,...,n  at 
some  i  (i.e.,  each  cell  contains  1/n  of  the  cases  in  its 
column) ,  then  very  little  information  is  available  to  predict 
visibility  at  that  i.   However,  if  p~,(jn|i)   =   1  for  some 
jn  and  p_, (j|i)   =   0  for  all  other  j  values,  then  there  is 
perfect  predictability  of  class  jn  by  the  predictor  at  class 
i.   The  underlying  methodology  of  this  study  will  be  to 
determine  the  maximum  conditional  probability  of  visibility 
for  each  predictor  value. 
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C.   STRATEGIES 

Preisendorf er  (1983  a,b,c)  presents  three  different 
prediction  strategies,  two  based  on  maximum  probabilities 
(MAXPROBl  and  MAXPR0B2)  and  one  based  on  natural  regression. 

1 .   Maximum  Probability 

This  strategy  consists  of  determining  the  cell,  in  a 
given  column,  with  the  highest  conditional  probability,  and 
assign  to  the  column  the  visibility  category  associated  with 
that  cell.   As  each  column  represents  an  interval  of  predic- 
tor values,  we  now  have  a  visibility  forecast  value  associated 
with  that  interval.   In  our  example  with  EHF  (Fig.  16), 
interval  1  (i  =  1)  will  have  a  forecast  value  of  visibility 
category  I  (VISCAT  1) .   Hence,  if  we  used  only  EHF  as  a 
predictor,  every  time  a  value  of  EHF  was  encountered  with  a 
value  <_  2.65,  we  would  predict  visibility  category  I.   Simi- 
larly, for  interval  2  (i  =  2)  and  for  interval  3  (i  =  3) 
we  would  choose  visibility  category  III  (VISCAT  3) . 

MAXPROBl  and  MAXPR0B2  differ  only  in  the  way  they 
handle  a  tie  between  maximal  conditional  probabilities  in 
a  column.   Should  this  occur,  then  a  decision  must  be  made 
as  to  which  predictand  category  will  be  assigned  to  that 
predictor  interval.   In  MAXPROBl,  this  decision  is  made  by 
a  coin  toss,  figuratively.   A  random  number,  in  the  unit 
interval,  is  generated.   The  unit  interval  is  divided  into  a 
number  of  subintervals  equal  to  the  number  of  tied  values 
and  each  subinterval  is  assigned  to  a  specific  predictand 
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category.   The  subinterval  into  which  the  random  number 
falls  determines  the  forecast  visibility  category.   In 
MAXPR0B2,  the  lowest  predictand  category,  among  the  tied 
categories,  is  chosen. 

2 .   Natural  Regression 

This  strategy  consists  of  first  finding  the  average 
predictand  (visibility  category)  for  each  predictor  interval, 
using  conditional  probabilities,  and  then  choosing  the 
predictand  category  nearest  the  average. 


j(i)   =    I         J  P21(j|i)  [7.1b] 

j-1 


Fig.  17  shows  the  computation  for  EHF  interval  1  (i  =  1) . 
Visibility  category  II  (VISCAT  2)  would  be  assigned  to  this 
interval  by  this  strategy. 

D.   COMPARISON  STATISTICS 

To  determine  if  a  predictor  will  be  useful  in  forecasting, 
there  should  be  a  statistic  with  which  to  compare  its  poten- 
tial utility.   Preisendorfer  (1983  a,b,c)  defines  four  such 
statistics  and  their  critical  values.   The  four  statistics 
defined  are  potential  predictability  (PP) ,  class-error 
probabilities  (a-,a,),  and  functional  dependence  (FD) . 
Potential  predictability  and  class-error  probabilities  will 
be  defined  now.   Functional  dependence  will  be  addressed 
later . 
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1 .   Potential  Predictability 

Potential  predictability  of  a  predictand/predictor 
pair  is  defined  as: 


III  n  ry 

PP(2|1)   =   n/(n-l)  I      p,(i)[£   (p21 ( j | i)  -  1/n) Z] 

i=l       j  =  l 


m 

I     p, (i)  PP(i) 
i=l   x 


where 


2 


PP(i)   =   n/(n-l)  I       (p   ( j | i)  -  1/n) 

j-1 


/ 


p, (i)   =   the  marginal  probability  of  a  predictor,  and 

Pp-,(j|i)   =   the  conditional  probability  of  the  jth 

predictand,  given  the  ith  predictor.   [4.1a] 


PP(2|l)  is  loosely  related  to  Shannon's  definition  of  infor- 
mation [Preisendorf er ,  1983a].   An  example  calculation  is 
shown  in  Fig.  18  where  EHF  has  a  PP  value  of  .330.   To 
determine  if  this  would  be  the  best  predictor  using  this 
statistic,  compute  the  potential  predictability  for  all 
predictors  and  rank  them  from  highest  to  lowest.   The 
predictor  with  the  highest  PP  should  be  the  best  predictor 
for  forecasting  visibility  using  any  strategy. 
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2  .       Class-Error   Probabilities 

Zero-class  (an)  and  one-class  (a,)  error  probabili- 
ties can  be  defined  to  gauge  the  predictive  skill  of  a 
prediction  strategy. 


m 
a0   =   . ^   Pi(i)  P21(^0 (i) 'i) 


where: 

p, (i)   =   the  marginal  probability  of  the  predictor, 

jn(i)   =   the  jnth  cell  in  column  i  assigned  by 
the  prediction  strategy,  and 

p01  ( jn  (i)'  |  i)   =   the  conditional  probability  of  the  jn(i). 
Z1      U  [6.1a]  U 


From  Figs.  15  and  16,  p,  (i)  =  .333  for  all  i;  jnd)  =  1/ 

p21(j0(l)|l)  =  .515;  jQ(2)  =  3,  p21(j0(2)|2)  =  .758;  and 

j  (3)  =  3,  p21(j0(3) | 3)  =  .788.   Therefore,  if  EHF  is  the  only 

predictor, 


aQ   =   (.333) (.515)  +  (333) (.758)  +  (.333) (.788)   =   .686 


The  statistic  a   is,  by  definition,  equal  to  the  fraction  of 
correct  forecasts  in  the  dependent  data  set. 


m 
al   =  .1      Px(i)  [P21(J0(i)  +1li)  +p21(j0(i)  "1li)  ] 
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where: 


p  ,(jn(i)  ±l]i)   =   the  conditional  probabilities 

adjacent  to  the  p  , (j  (i)|i) 

values  used  in  the  a_ 
determination . 


If  j0  =  1  then,  by  definition,  Poi^n^  —  1 1  i )  =0;  similarly 
if  jn  =  n  then,  by  definition,  p„,  (jQ(i)  +  l|i)  =  0.   [6.2a] 
The  statistic  a,  is,  by  definition,  equal  to  the  fraction  of 
forecasts  for  which  a  class  1  error  has  been  committed. 
Again,  from  Figs.  15  and  16: 


a;L      =   (.333)  (.212+0)  +  (  .  333)  ( . 212+. 0)  +  (.  333)  (.  182  +  0 
=  ■  .202 


To  determine  which  one  of  two  or  more  predictors  is 
the  most  skillful,  we  can  plot  the  (an,a, )  pairs  on  a  skill 
diagram  as  in  Fig.  19.   The  dashed  lines  are  lines  of  con- 
stant class  error  (CE  =  a,  +  2a2)  and  the  more  skillful 
predictors  will  lie  on  the  lower  right  part  of  the  triangle. 
In  general,  the  skill  on  the  diagram  decreases  according  to 
the  zig-zag  rule  shown  in  the  figure.   If,  for  all  predic- 
tors, a,  is  constant,  which  may  occur  during  the  first 
predictor  determination  with  a  data  set  containing  relatively 
few  poor  visibility  cases,  then  the  best  predictor  is  that 
one  with  the  greatest  aQ  value.   In  this  instance  there  is 
no  need  to  plot  the  pairs. 
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III.   MULTIPLE  PREDICTOR  STATISTICS 

Once  all  predictand/predictor  pairs  have  been  formed 
and  potential  predictability  and  skill  scores  determined, 
the  predictors  can  be  ordered  by  decreasing  predictor  skill 
and  by  potential  predictability.   Fig.  20  contains  the 
bivariate  plot,  conditional  probabilities,  potential  pre- 
dictability and  skill  scores  for  the  remaining  three  predic- 
tors in  our  artificial  data  set.   The  ordering  of  predictors 
is  shown  in  Table  A2  .   Therefore,  EHF  would  be  chosen  as 
our  first  predictor,  as  illustrated  on  the  skill  diagram 
in  Fig.  19.   As  RH,  FTER  and  ASTD  have  equal  aQ  and  a, 
values,  they  are  ranked  according  to  decreasing  potential 
predictability. 

TABLE  A2 

RANKING  OF  PREDICTORS  BY  SKILL 
AND  POTENTIAL  PREDICTABILITY 


!o 

^1 

PP 

1st 

EHF 

.686 

.202 

.330 

2nd 

RH 

.606 

.202 

.225 

3rd 

FTER 

.606 

.202 

.211 

4  th 

ASTD 

.606 

.202 

.209 

Preisendorf er  (1983b)  develops  statistics,  similar  to 
those  already  mentioned,  for  multiple  predictors.   The  main 
conceptual  difficulty  of  additional  predictors  is  the 
increase  of  dimensions.   One  predictor  presents  a  relatively 
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simple  two-dimensional  problem  (predictor  1  vs.  predictand) ; 
two  predictors  present  a  three-dimensional  problem  (predictor  1 
vs.  predictor  2  vs.  predictand);  three  or  more  predictors 
present  four-dimensional  and  larger  problems.   However,  with 
a  little  manipulation,  all  of  the  multi-dimensional  problems 
greater  than  two-dimensions  can  be  reduced  to  a  two-dimensional 
problem.   This  is  illustrated  in  Figs.  21  and  22  for  three- 
dimensions  (two  predictors)  and  four-dimensions  (three  predic- 
tors) .   An  easily  programmable  equation  can  be  developed  to 
create  these  two-dimensional  arrays  based  upon  the  number  of 
equally  populous  intervals  for  each  predictor  and  upon  the 
interval  in  which  a  particular  data  case  resides. 

In  our  continuing  example,  reduce  the  equally  populous 
intervals  for  each  predictor  to  an  integer  number  (i  =  l,...,m) 
with  1  corresponding  to  the  lowest  interval  and  m  correspond- 
ing to  the  highest  interval,  as  defined  for  the  predictor 
index  in  Section  II. A.   Let 

ii  =  the  interval  integer  number  for  EHF, 

jj  =  the  interval  integer  number  for  RH, 

kk  =  the  interval  integer  number  for  FTER, 

mm  =  the  interval  integer  number  for  ASTD, 

11   =   the  column  location  in  the  two-dimensional 
bivariate  plot  (equivalent  to  i  for  a 
single  predictor) , 

IGPl  =  the  total  number  of  intervals  for  EHF, 

IGP2  =  the  total  number  of  intervals  for  RH, 

IGP3  =  the  total  number  of  intervals  for  FTER, 

IGP4  =  the  total  number  of  intervals  for  ASTD. 
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Then,  for  one  predictor,  EFH: 

11   =   ii 
for  two  predictors,  EHF  and  RH: 

11   =   IGP2(ii-l)  +  jj 
for  three  predictors,  EHF,  RH  and  FTER: 

11   =   IGP2(ii-l+IGPl(kk-l)  )  +  jj 
for  four  predictors,  EHF,  RH ,  FTER  and  ASTD: 

11   =   lGP2(ii-l+IGPl(kk-l+IGP3(mm-l) ) )  +  jj 

This  equation  form  can  be  expanded  to  accommodate  any  number 
of  predictors. 


IV.   FUNCTIONAL  DEPENDENCE 

After  the  first  predictor  has  been  selected,  either  from 
its  skill  score  or  potential  predictability,  we  need  a  means 
to  determine  whether  or  not  to  add  a  new  predictor  to  the 
one(s)  already  chosen.   For  this  purpose,  Preisendorfer 
(1983c)  proposes  a  functional  dependence  index  (FD)  which 
describes  the  dependence  of  the  new  predictor  being  considered 
upon  those  already  in  the  set  of  predictors.   If  FD  is  large 
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(on  the  scale  0  to  1)  then  it  can  be  represented  by  predic- 
tors already  chosen  and  its  inclusion  into  the  set  of 
predictors  would  be  redundant.   However,  if  FD  is  small  (on 
the  scale  0  to  1)  then  it  is  likely  to  be  a  useful  addition 
to  the  existing  collection  of  predictors. 


m   n 
FD(2|1)   =   m/2(m-l)  I         J   p   (i,  j)|q(i,  j )  -r  (i ,  j )  |    (2.1c) 

i=l  j=l 


where 


n-j  j-1 

q(i,j)   =    I      p21(j+k|i+l)  +  I      p21(j-k|i-l)     (2.2c) 


k=l  k=l 


the  sura  of  the  conditional  probabilities 
which  lie  in  column  i+1  and  rows  greater 
than  j  and  the  conditional  probabilities 
which  lie  in  column  i-1  and  rows  less  than  j 

the  sum  of  the  conditional  probabilities  to 
the  right  and  up,  and  to  the  left  and  down. 
The  upper  left  (l,n)  and  lower  right  (m,l) 
cells  will  always  have  q  values  equal  to  zero 


j-1  n-j 

r(i,j)   =    I      p  , (j-k|i+l)  +  I      p   (j+k|i-l)     (2.3c) 
k=l  k=l   zx 


the  sum  of  the  conditional  probabilities 
which  lie  in  column  i+1  and  rows  less  than  j 
and  the  conditional  probabilities  which  lie 
in  column  i-1  and  rows  greater  than  j 

the  sum  of  the  conditional  probabilities 
to  the  right  and  down,  and  to  the  left  and  up 
The  upper  right  (m,n)  and  lower  left  (1,1) 
cells  will  always  have  r  values  equal  to  zero 
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p,  2<i,j)  an^  Po-i  ( J1^  I  i1!)   =   the  joint  and  conditional 

probabilities  defined  earlier,  differing 
only  in  that  the  abscissa  and  ordinate  are 
now  predictor  vs.  predictor  vice  predictor 
vs.  visibility. 

Fig.  23  illustrates  the  FD  computation  for  RH  given  EHF. 
In  this  example,  FD(2|l)  =  FD(RH|EHF)  =  .286. 


V.   CRITICAL  VALUES 

Once  the  various  statistics  have  been  found,  a  means  to 
determine  whether  they  are  significant  must  be  established. 
Preisendorfer  (1983  a,b,c)  proposes  the  use  of  Monte  Carlo 
means,  applied  as  follows. 

From  the  bivariate  plot,  as  in  Figs.  14,  21b  and  22b, 
we  determine  the  marginal  probabilities  of  the  predictor 
(p, (i) )  and  establish  incremental  values  from  0  to  1  (note 
that  for  equally  populous  predictor  intervals,  p,(i)  =  1/m, 
a  constant,  where  m  =  the  number  of  intervals) .   We  then  cast 
a  total  of  n(.,.)  randomly  generated  numbers  into  the 
intervals  to  simulate  a  new  data  set.   After  each  randomly 
generated  data  case  is  cast  into  a  column,  it  is  placed  into 
a  cell  using  uniform  probability.   Fig.  24  shows  the  incre- 
mental values  associated  with  the  bivariate  plot  in  Fig.  21b. 
In  our  continuing  example  we  have  n(.,.)  =  99,  so  we  would 
generate  99  random  numbers  in  the  unit  interval.   All  random 
numbers  <  .071  would  be  placed  in  column  i  =  1;  those  greater 
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than  .0  71  and  <_  .192  would  be  placed  in  column  i  =  2 ;  and 
so  on.   As  each  data  case  is  placed  into  a  column,  a  single 
random  number  is  generated  to  determine  into  which  cell  the 
case  is  to  be  placed  (e.g.,  a  random  number  <_    .33  would  be 
counted  in  cell  (i,l);  a  random  number  greater  than  .33  and 
<_  .66  would  be  counted  in  cell  (i,2);  etc.).   After  all  99 
cases  have  been  cast  into  their  appropriate  cells,  all  of 
the  statistics  previously  discussed  would  be  computed  and 
saved.   This  process  would  be  repeated  100  times  so  that  we 
would  have  an  array  containing  100  randomly  generated  poten- 
tial predictabilities,  an ' s ,  a  's  and  FD's.   These  would  be 
sorted  from  lowest  to  highest  and  the  96th  (PP(96),  a  (96), 
a, '(96)  and  FD(96))  value  would  determine  the  upper  5%  critical 
value  and  the  5th  (PP(05),  aQ(05),  ax(05)  and  FD(05))  value 
would  determine  the  lower  5%  critical  value.   For  all  statis- 
tics other  than  FD,  we  want  values  from  our  dependent  data 
set  to  be  greater  than  the  upper  5%  or  less  than  the  lower 
5%  critical  values.   For  FD  we  want  values  lower  than  the 
upper  5%  critical  value  to  ensure  that  our  second,  and  subse- 
quent, predictor  is  not  significantly  dependent  on  the  previous 
predictor (s) . 


VI.   CHOOSING  PREDICTORS 

The  first  predictor  is  determined  as  shown  in  Section  III 
That  is,  by  computing  initial  PP ,  a   and  a,  values  for  each 
predictor,  ordering  them  by  skill  score  and  PP  and  choosing 
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the  one  with  the  greater  skill  score,  or  greatest  PP  in  the 
event  that  all  skill  scores  are  identical. 

Subsequent  predictors  will  be  subjected  to  two  tests; 
functional  dependence  and  skill  score.   Let 

p   =   the  number  of  predictors  already  chosen, 

an (k-1)  and  a, (k-1)   =   the  0-  and  1-class  errors 

of  the  previous  stage  of  construction  of  the 
developmental  model, 

k   =   the  index  of  the  current  stage. 

Then,  for  the  next  (kth)  predictor  to  be  accepted  it  should 
meet  the  following  three  conditions: 

(1)   FD   <   FD(96| i)    (i  =  l,p) 


(2)   aQ(k)   >   aQ(k-l)   and   a-^k)   <.a1(k-l) 


(3)   aQ(k)   >   aQ(96)   and  a±(k)       <   a.^05) 

If  condition  (1)  is  not  met  but  conditions  (2)  and  (3)  are, 
then  a  predictor  may  still  be  used,  but  the  increase  of 
predictability  of  the  predictand  will,  on  average,  be  less 
than  if  condition (1)  had  been  met.   However,  if  conditions 
(2)  and  (3)  are  not  met,  then  the  predictor  should  not  be 
considered  further.   Repeat  this  process  at  all  stages  for 
all  remaining  predictors  until  no  further  predictors  are 
available,  then  stop  the  construction  of  the  developmental 
model . 
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VII.   TESTING  THE  DEVELOPMENTAL  MODEL  ON  INDEPENDENT  DATA 

Once  the  model  has  been  developed  and  no  further  predic- 
tors remain  to  be  considered,  we  can  test  it  for  skills 
(an,a, )  on  an  independent  data  set  (any  set  whose  numbers 
were  not  used  to  develop  the  model) .   This  is  easily  accom- 
plished by  sorting  the  independent  data  case  values  into 
predictor  intervals,  determined  from  the  dependent  data,  and 
calculating  the  location  in  the  forecast  array  (11  in  Figs. 
21b  and  22b)  of  the  appropriate  prediction,  using  the  equa- 
tions established  in  Section  III.   It  is  to  be  expected  that 
on  average  the  test  (a_,a,)  points  on  the  skill  diagram,  for 
an  independent  data  set,  will  not  be  as  skillful  as  on  the 
set  of  developmental  points. 
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APPENDIX  B 
LINEAR  REGRESSION  AND  THRESHOLD  MODELS 

A.   LINEAR  REGRESSION 

In  this  study  a  least-squares,  multiple  linear  regression 
model,  known  as  BMDP2R  in  the  BMDP  Statistical  Software 
[University  of  California,  1981] ,  was  used.   The  procedure 
used  is  called  forward  step-wise  selection  and  picks  the 
predictors  (of  the  many  offered)  that  have  the  highest 
correlation  with  the  predictand  (visibility)  based  upon  F-to- 
enter  and  F-to-remove  limits,  where  F  is  a  ratio  which  tests 
the  significance  of  the  coefficients  of  the  predictors  in 
the  regression  equation. 

.The  regression  model  fitted  to  the  data  is 


y   =   a  +  b,x,  +  b~x~  +  . . .  +  b  x   +  z 
2  112  2  p  p 


where; 

y  =  the  dependent  variable  (predictand)  which  can 
be  either  a  continuous  function  or  a  discrete 
value 

x, , . . . ,x   =   the  independent  variables  (predictors) 

b,,...,b    =   the  regression  coefficients 
1      p  r 

a   =   the  intercept 

p  =   the  number  of  independent  variables 

e   =   the  error  with  mean  zero. 
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The  predicted  value  y,  and  the  general  form  of  the  resulting 
equation,  is 


y   =   a  +  b,x,  +  b„x„  +  ...  +  b  x 
•*  112  2  p  p 


The  step-wise  selection  of  predictors  continues  until  there 
are  no  predictors  remaining  which  meet  the  F-to-enter  criteria 
The  regression  equation  generated  at  each  step  is  printed, 

along  with  its  R-value  (the  correlation  of  the  dependent 

2 
variable  y  with  the  predicted  value  y)  and  R  .   The  resulting 

set  of  equations,  one  for  each  step,  are  reviewed,  and  that 

equation  containing  only  those  predictors  which  increased 

2 

R  by  at  least  .01  is  retained  for  application. 

The  role  of  regression,  once  appropriate  predictor 
variables  have  been  selected,  is  simply  that  of  dimension 
reduction  (representing  a  multivariate  structure  by  a  uni- 
variate proxy  which  constitutes  a  classif icatory  or  predictive 
index).   This  proxy  takes  the  form  of  a  polynomial,  linear 
in  its  coefficients,  of  the  components  of  the  multivariate 
structure.   The  problem  now  becomes  one  of  determining  the 
form  of  the  state  conditional  distributions  (one  for  each 
group  of  interest;  e.g.,  1,  2  and  3  for  visibility  categories 
I,  II  and  III,  as  used  in  this  study) .   Once  an  appropriate 
form  has  been  selected,  it  remains,  then,  to  determine  the 
parameters  of  the  class  conditional  distributions  (e.g., 
means  and  variances)  and  then  apply  the  decision  criteria  or 
threshold  model. 
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B.       THRESHOLDS     [LOWE,    19  84a] 

1.      Notation 

E  E  an  event;  this  is  an  indicator  variable  which 
when  E  =  1,  the  threatening  event  occurs,  and 
when  E  =  0,  the  non-threatening  event  occurs. 

C   e   the  classification  of  an  unknown  event  which 
when  C  =  1,  the  event  is  classified  as  a 
threat,  and  when  C  =  0,  the  event  is  classified 
as  a  non-threat. 

P[E  =  1]   =   unconditional  probability  of  occurrence  of 
threat. 

P[E  =  0]   e   unconditional  probability  of  occurrence  of 
non-threat . 

Error  of  the  1st  kind  (false  alarm)   [C  =  1  n  E  =  0]  . 

Error  of  the  2nd  kind  (miss)   [C  =  0  n  E  =  1]  . 

P[C  =  lnE=0]   =   joint  probability  of  an  error  of  the  1st 

kind. 

P[C  =  0nE=l]       =      joint   probability   of   an   error   of   the 

2nd  kind. 

P[C=1|E=0]       =      class    conditional    probability   of   misclassi- 
fying    a   non-threat. 

P[C  =  0|E=1]      e      class   conditional   probability  of  misclassi- 
fying   a   threat. 

P[C  =  lnE=0]       =      P[C  =  1|E=0]     P[E=0]. 

P[C  =  0nE=l]       =      P[C=0|E=1]     P[E=0]. 

z      =      a    value   of    the   predictive    index    (equivalent 
to   y,    above) . 

Z      =      range   of   the  predictive   index   on  the   real    line. 


For   a   dichotomous   problem,    Z    is    into    two   parts    Z    ,    Z, , 
C      =      0      if      z    e    Z 


C      =      1      if      z    £    Z, 
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The  decision  regions  are  mutually  exclusive  and  exhaustive 
(i.e.,  ZQ  n  Z,  =  0  and  Z  =  Z  u Z,) . 

Thresholds   E   boundary (s)  between  decision  regions. 

p(z|E=0)    =   class  conditional  density  of  z  given 

that  E  =  0 . 

p(z|E  =1)    =   class  conditional  density  of  z  given 

that  E  =  1. 

A(z)       =      p (z | E  = 1) /p (z | E  = 0)       =      the   maximum   likelihood 

ratio    (i.e.,    the   ratio   of    class    conditional 
densities) . 

p         =      p{[C=lnE=0]     u     [C=0nE=l]}      =      the   total 
e 

probability  of  error. 


2 .   Minimum  Probability  of  Error  Criterion 

p    =   probability  of  an  incorrect  classification. 


p    =   p[C  =  l|E=0]  p[E=0]  +  p[C=0|E=l]  p[E=l] 


where   p[E=l]  +  p  [E  =  0  ]  =  1.   Note  that  the  events  E  =  1 
and  E  =  0  are  mutually  exclusive  and  exhaustive.   The  objec- 
tive is  to  select  decision  regions  (thresholds)  so  as  to 

minimize  p  . 
^e 

p[C=0|E=l]       =        /         p(z|E=l)dz      =      the   probability   of 

z,ZQ 

misclassifying  E  =  1. 

p[C=0|E=l]   =    /    p(z|E=l)dz  +   /    p(z|E=l)dz 

zeZ  ZeZ-j^ 

-   /    p(z  |E  =l)dz 
zeZ, 
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p[C=0|E=l]       =      1    -      /        p(z|E=l)dz  these   are 

zeZ.  substituted 

into   the 
expression 
p[C=l|E=0]       =         /         p(z|E=0)dz  for    p 

zeZn 


then, 


p         =      p[E=0]       /         p(z|E=0)dz    +    p[E=l][l    -       /         p(z    |E=l)dz] 

zeZ,  Z£Z, 


and  algebraic  rearrangement  yields, 


p        =      p[E=l]     -      /         {p[E=0]    p(z|E=0)     -    p[E=l]    p(z|E=l)}d: 

ZeZ, 


In  order  to  minimize  p  ,  Z,     (the  decision  region  for  C  =  1] 
will  include  all  those  values  of  z  for  which  the  integrand 
in  the  expression  for  p  will  be  negative.   The  decision 
regions  can  be  symbolically  represented  as  follows: 


Z    =   {z:  p[E=0]  p(z|E=0)  -p[E=l]  p(z|E=l)  >  0} 


Z1       =       {z:    p[E=0]     p(z|E=0)     -    p[E=l]    p(z|E=l)     <    0} 

An  alternative  representation  is  given  by, 

ZQ       =       (z;    p[E=0]     p(z|E=0)     >    p[E=l]     p(z|E=l)} 
=      {z:    p[E  =0]/p[E  =1]     >    p(z|E  =  l)/p(z|E  =0)  } 
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Likewise, 


Z,       =       {z:    p[E  =0]/p[E  =1]     <    p(z|E  =l)/p(z|E  =0)  } 


These  statements  can  be  combined  to  give, 


c=l 
p(z|E  =l)/p(z|E  =0)       =      A(z)       >      p[E  =0]/p[E  =1] 

c=0 


Thresholds   are   the  value (s)    of    z    for  which 

A(z)       =      p[E  =0]/p[E  =1] 

This  equation  can  be  solved  for  z  either  analytically  or 

numerically  depending  on  the  forms  of  the  density  functions 

3.   Threshold  Cases 

In  order  to  examplify  the  model,  the  assumption  is 

made  that  the  class  conditional  distributions  are  Gaussian. 

There  are  essentially  three  distinct  cases  that  can  arise. 

a.   Case  I:   Equal  variances;  different  means 

(Referred  to  as  the  equal  variance  model  in  the 
text) 

p(z|E=l)   =   k  exp{(-l/2)  (z  -u1)2/a2} 


p(z|E=0)   =   k  exp{(-l/2)  (z  -u0)2/a2} 


where : 


,       ,0  x-1/2  -1 
k   =   (2tt)   '  a 
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A(z 


exp{(-l/2)  (z  -y1)2/a2}   c=l   pQ 
exp{(-l/2) (z  -u0)2/o2}   c=0   pl 


where  p~  =  p[E=0]  and  p,  =  p[E=l].   Thus,  the  threshold 
value  is 


(yQ+y1)/2  +  a   ln(p  /p1)/(y1  -  \iQ) 


Classification  index  (z) 

The  position  of  the  threshold  depends  on  the  relative  values 

of  p,  and  pn .   The  threshold  moves  toward  the  group  with  the 

smallest  p..   If  p.  =  p.  the  threshold  will  be  the  value  of 
*i       ^1    ^0 

z  where  the  densities  intersect  (i.e.,  where  the  densities 
are  equal) . 

b.   Case  II:   Equal  means;  different  variances 


A(z) 


aQexp{ (-1/2)  (z  -  y1)  /a,}   c=l   p 
a,exp{(-l/2)  (z  -y_)2/a2}   c=0 


1 


0 


0 
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with  the  threshold 


0  2  2 

2qogi 

L(ar°o) 


In  ( 


P0Q1 
Pl°0 


1/2 


Note  that  in  this  situation  there  are  two  thresholds.  The 
group  having  the  smaller  variance  will  lie  between  the  two 
thresholds. 


E=1 

'A' 

1 

> 

'••'* 

c 

<D 

Q 

— .  ■ 

\    ^V*s>^-^. 

Classification  index  (z) 

The  thresholds  shown  are  typical  of  a  situation  where  p,  <  pn 
Note  that  these  thresholds  lie  between  the  two  intersections 
of  the  densities.   If  the  inequality  of  prior  probabilities 
were  reversed,  the  thresholds  would  lie  outside  of  the 
region  between  the  two  density  intersections.   Further  note 
that  the  decision  region  for  the  group  having  the  lesser 
variance  lies  between  the  thresholds. 
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c.   Case  III:   General  Solution  (Referred  to  as 
the  Quadratic  Model  in  the  text) 


p(z|E  =1)   =  k/o±   exp{  (-1/2)  (z  -m^/o2} 


p(z I E  =  0) 


k/aQ  exp{(-l/2)  (z  -yQ)2/a2} 


A(z)   =   exp{l/2 


z  -y 


0,  2 


(   ,   )   -  ( 


z -y1  2"! 


c=l 

}    <n^i 

c=0  Pla0 


-1/2 
where  k  =  (2tt)     .   Algebraic  manipulation  produces 


(a1-aQ)z      +   2  (aQy1  -  a1y0)z 


c-1 


2    2         2    2  2    2 

+    [(aj^yQ  -OQy,)   -  20QO,   In  (pa1/p1aQ)  ]       < 

c=l 


which  is  recognizable  as  a  quadratic  equation  in  z 


where 


z*   =   -b  ±  (b2-  4ac) 1/2/2a 


a   = 


2     2 

Ql  "  ao 


b   =   2(a2y1  -  a2^Q 


c   =  (a^l    -  o2Qul)    -  2a2y2  In  (Po^/p^) 
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c 

<D 
Q 


,   E  =  1 


Classification  index   (z) 

The  remarks  given  for  the  figures  in  cases  I  and  II  are  also 
applicable  here.   More  often  than  not,  only  one  of  a  pair  of 
thresholds  induced  by  differing  variances  will  be  of  real 
interest.   If  the  variances  of  the  two  groups  are  radically 
different,  then  both  members  of  the  threshold  pair  become 
important. 

In  the  foregoing,  normal  class  conditional  dis- 
tributions were  assumed.   This  was  done  because  the  Gaussian 
form  admits  of  a  rather  clean  analytical  solution.   However, 
the  general  concept  of  the  minimum  probable  error  decision 
criteria  may  be  applied  to  any  form  of  density  function. 
Indeed,  the  density  function  of  one  group  need  not  even  be 
the  same  form  as  that  for  another  group  (one  might  be  exponen- 
tial and  the  other  Gaussian) .   The  difficulty  with  most  non- 
Gaussian  forms  is  that  they  seldom  admit  of  closed  analytical 
forms  and  require  numerical  means  in  determination  of  thresholds 
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APPENDIX  C 

NORTHERN  HEMISPHERE  PREDICTOR  PARAMETERS  AVAILABLE 
FOR  THE  NORTH  PACIFIC  OCEAN,  JULY  1979,  EXPERIMENTS 


Area:   30°-60°N;  145°E-130°W 

Model  output  time:   0  00  0GMT  (TAUOO) 


A.   Model  output 
parameters 


Descriptive  name  of  parameters 


Primitive  equation  model 


TX 

EX 

EHF 

SEHF 

THF 

H510 

GGTHTA 

FTER 


Surface  air  temperature 

Surface  vapor  pressure 

Evaporative  heat  flux 

Sensible  plus  Evaporative  heat  flux 

Total  heat  flux 

1000-500  mb  thickness  anomaly 

Surface-front  location  parameter 

Advective  fog  probability 


Mass  structure  model 


PS 

TAIR 

EAIR 

TSEA 

SSANOM 

T925 

U925 

V925 

NCLOUD 


Surface  pressure 

Surface  air  temperature 

Surface  vapor  pressure 

Sea  surface  temperature 

Sea  surface  temperature  anomaly 

9  25  mb  temperature 

9  25  mb  zonal  wind  component 

925  mb  meridional  wind  component 

Total  cloud  cover 


Marine  wind  model 


WWW 
DDWW 


Marine  surface  wind  speed 
Marine  surface  wind  direction 
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B.  CI ima to logical  parameter 

CLIMO  National  Climatic  Center  fog 

frequency  climatology 

C.  Derived  parameters 

ASTD  TAIR-TSEA 

RH 

Surface  relative  humidity 
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APPENDIX  D 

NOGAPS  PREDICTOR  PARAMETERS  AVAILABLE  FOR  THE  NORTH 
ATLANTIC  OCEAN,  15  MAY -15  JULY  19  83,  EXPERIMENTS- 


Area:   Entire  North 

Model  output  time: 

A.   Model  output 
parameter 

D1000 

D925 

D850 

D700 

D500 

D4  0  0 

D300 

D250 

TAIR 

T1000 

T925 

T700 

T500 

T4  0  0 

T300 

T250 

EAIR 

E1000 

E925 

E850 

E700 

E500 

UBLW 

U1000 

U925 


Atlantic  Ocean  and  Mediterranean  Sea 

1200GMT  (TAUOO) 

Descriptive  name  of  parameter 

1000  mb  geopotential  height 

925  mb  geopotential  height 

850  mb  geopotential  height 

700  mb  geopotential  height 

500  mb  geopotential  height 

4  00  mb  geopotential  height 

300  mb  geopotential  height 

250  mb  geopotential  height 

Surface  air  temperature 

1000  mb  temperature 

92  5  mb  temperature 

70  0  mb  temperature 

500  mb  temperature 

400  mb  temperature 

300  mb  temperature 

250  mb  temperature 

Surface  vapor  pressure 

1000  mb  vapor  pressure 

925  mb  vapor  pressure 

850  mb  vapor  pressure 

700  mb  vapor  pressure 

500  mb  vapor  pressure 

Boundary  layer  zonal  wind  component 

1000  mb  zonal  wind  component 

9  25  mb  zonal  wind  component 
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U850 
U700 
U500 
U400 
U300 
U250 
VBLW 


V1000 

V9  2  5 

V850 

V700 

V500 

V4  0  0 

V300 

V250 

VOR925 

VOR500 

PS 

SMF 

PBLD 

STRTFQ 

STRTTH 

SHF 

ENTRN 


** 
** 


DRAG 


*  * 


85  0  mb  zonal  wind  component 

70  0  mb  zonal  wind  component 

50  0  mb  zonal  wind  component 

400  mb  zonal  wind  component 

30  0  mb  zonal  wind  component 

2  50  mb  zonal  wind  component 

Boundary  layer  meridional  wind 
component 

1000  mb  meridional  wind  component 

9  25  mb  meridional  wind  component 

850  mb  meridional  wind  component 

700  mb  meridional  wind  component 

500  mb  meridional  wind  component 

400  mb  meridional  wind  component 

300  mb  meridional  wind  component 

2  50  mb  meridional  wind  component 

9  25  mb  vorticity 

500  mb  vorticity 

Surface  pressure 

Surface  moisture  flux 

Planetary  boundary-layer  depth 

Percent  stratus  frequency 

Stratus  thickness 

Surface  heat  flux 

Entrainment  at  top  of  marine 
boundary-layer 

Drag  coefficient  (C  ) 


B.   Derived  parameters 


DTDP 

DEDP 

DUDP 

DVDP 

RH 

BM1  *** 


Vertical  gradient  of  temperature 
Vertical  gradient  of  vapor  pressure 
Vertical  gradient  of  zonal  wind 
Vertical  gradient  of  meridional  wind 
Surface  relative  humidity 

2.81132  +  (.16201  x  EAIR) 

-  (  .00237xE850)  -  (.0739xT925) 

-  (.16179xE925) 
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BM2  ***  2.08302  +  (. 36810  x TAIR) 

-  (.26675  x T1000)  -  (.  15980  x T925 ) 

BM3  ***  3.00866  +  (. 11771  x EAIR) 

-  (.01024  x E850)  -  ( .19321  x E925) 

BM4  ***  2.42235  -  (. 000418  x UBLW) 

+  (  .000255  x  U700) 

BM5  ***  2.55859  -  (. 000355  x  V1000) 

BM6  ***  2.57317  +  ( . 00089 3  x D1000) 

-  (.0000489  x  D700) 

BM7  ***  -15.2173  +  (.01764  x PS) 

-  (  .01007  x STRTFQ)  +  (.  02642  x STRTTH) 
+  (  .06042  x  SHF) 


*     Parameters  which  were  not  used  due  to  their  being 
considered  as  having  little  likelihood  of  being 
important  in  forecasting  marine  visibility. 

**    Parameters  which  were  not  used  due  to  loss  of 
significant  digits  during  transfer  from  tape 
to  mass  storage. 


*  ** 


Linear  regression  equation  parameters. 
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APPENDIX   E 
SKILL    AND    THREAT    SCORES 


V)    w 

< 
o 

LU    2 

ce 
O 
"■    1 


R 

S 

T 

U 

V 

W 

X 

Y 

z 

1  2  3 

OBSERVED 


Total       =       R+S    +    T    +    U    +    V    +    W+X    +    Y+Z 

PI   =   (R+U+X) /Total  P3   =   (T+W+Z) /Total 

P2   =   (S+V+Y) /Total  PN   =   greatest  of  Pi,  P2  or  P3 

Raw  scores 

AO   =   %  correct   =   (X+V+T) /Total 

Al   =   1  -class  error   =   (U+S+Y+W)  /Total 

TS1   =   Threat  score  for  visibility  category  I 
=   X/(R+U+X+Y+Z) 

TS2   =   Threat  score  for  visibility  category  II 
=   V/(U+X+V+Y+W) 

TS12   =   Threat  score  for  visibility  categories  I  and  II 
=   (X+V)/ (Total -T) 


TS12  is  designed  to  represent  the  skill  of  forecasting  visi- 
bility categories  I  and  II  as  separate  categories,  rather 
than  their  skill  as  a  combined  category,  which  would  be 
(U+V+X+Y)/(Total-T) . 
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Adjusted  scores 

AAO  =  (A0-PN)/(1-PN) 

ATS1  =  (TS1-P1)/(1-P1) 

ATS2  =  (TS2-P2)/(1-P2) 

ATS12  =  (TS12-[P1+P2J)/(1- [P1+P2]) 
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APPENDIX  F 
TABLES 


TABLE  I.   A  SUMMARY  OF  THE  OBSERVATIONS  (PERCENTAGE 

FREQUENCIES)  OF  THREE  VISIBILITY  CATEGORIES 
(VISCAT'S),  FOR  THE  NORTH  ATLANTIC  OCEAN 
HOMOGENEOUS  AREAS  SHOWN  IN  FIG.  1,  15  MAY- 
15  JULY  1983 


NUMBER  OF 


AREA 

OBERSERVATIONS 
2725 

VISCi 
163 

\T  I 
..06) 

VISCi 
436 

VT  II 

:.i6) 

VISCAr 
2126 

C  III 

1 

:.78) 

2 

2867 

277 

,.10) 

317 

Ml) 

2273 

:.79) 

3E 

131 

8  1 

.06) 

31 

:  .24) 

92 

,.70) 

3W 

2288 

437  1 

.19) 

2  84 

[.12) 

1567  I 

,.68) 

4 

4771 

129  1 

.03) 

597 

:  .13) 

4045 

,  .85) 

5E 

1087 

9  1 

.01) 

94 

:.o9) 

984  1 

.91) 

5W 

2307 

8  1 

.003) 

40 

:.o2) 

2259 

:.98) 

6N 

580 

19  1 

.03) 

45 

;.o8) 

516 

.89) 

6M 

2337 

21  I 

.01) 

131 

:.o6) 

2185  1 

,.93) 

6S 

60 

1  1 

.02) 

2 

:.o3) 

57 

'.95) 

7 

801 

7  I 

.01) 

34 

,.04) 

760  1 

.95) 

8 

1284 

1  1 

.001) 

27 

:.02) 

1256  1 

.98) 

ENTIRE  NORTH  ATLANTIC  AND  MEDITERRANEAN 

21,238  1080  (.05)    2038  (.10)   18,120  (.85) 
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TABLE  II.   NUMBER  OF  OBSERVATIONS  (PERCENTAGE  FREQUENCIES) 
OF  THREE  VISIBILITY  CATEGORIES  (VISCAT'S), 
AND  9  5%  CONFIDENCE  INTERVALS  FOR  THE 
DEPENDENT  AND  INDEPENDENT  DATA,  FOR  THE  NORTH 
PACIFIC  OCEAN  AND  AREA  3W  OF  THE  NORTH 
ATLANTIC  OCEAN 


North  Pacific  Ocean,  July  1979 

TOTAL  #  OF 

VISCAT  I      VISCAT  II  VISCAT  III  OBSERVATIONS 

95%  CI  .207-. 229     .126-. 144  .635-. 660 

Dependent  data     816  (.222)    498  (.135)  2368  (.643)  3682 

Independent  data   388  (.211)    246  (.134)  1207  (.656)  1841 

Total             1204  (.218)    744  (.135)  3575  (.647)  5523 

North  Atlantic  Ocean  area  3W,  FATJUN  19  83 

95%  CI    '         .175-. 207     .111-. 138  .666-. 704 

Dependent  data     296  (.194)    190  (.125)  1040  (.682)  1526 

Independent  data   141  (.185)     94  (.123)    527  (.692)  762 

Total             437  (.191)    284  (.124)  1567  (.685)  2288 
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TABLE  III. 


THE  INITIAL  FIVE  BEST  PREDICTORS  FOR 
EPI'S  OF  FOUR  THROUGH  TEN,  FOR  EACH 
STRATEGY,  WITH  ASSOCIATED  PP,  ao ,  ai 
AND  CE  VALUES  FROM  THE  NORTH  PACIFIC 
OCEAN  DEPENDENT  DATA,  JULY  19  79 


Maximum-probability   Natural-regression 


EPI   Predictor 

PP 

ao 

.684 

al 

.135 

CE 

a0 

.491 

al 
.467 

CE 

4     EHF 

.328 

.497 

.551 

SEHF 

.315 

.681 

.135 

.503 

.478 

.475 

.569 

FTER 

.317 

.680 

.135 

.505 

.482 

.468 

.568 

CLIMO 

.296 

.657 

.135 

.551 

.471 

.478 

.580 

RH 

.311 

.649 

.135 

.567 

.508 

.442 

.542 

5     EHF 

.337 

.697 

.135 

.471 

.435 

.538 

.592 

SEHF 

.319 

.688 

.135 

.489 

.535 

.400 

.530 

FTER 

.314 

.678 

.135 

.509 

.539 

.396 

.526 

RH 

.312 

.658 

.135 

.549 

.449 

.518 

.584 

CLIMO 

.295 

.658 

.135 

.549 

.418 

.549 

.615 

6     EHF 

.338 

.695 

.135 

.475 

.491 

.467 

.551 

SEHF 

.319 

.690 

.135 

.485 

.478 

.475 

.569 

FTER 

.318 

.673 

.135 

.519 

.574 

.349 

.503 

RH 

.316 

.661 

.135 

.54  3 

.508 

.442 

.542 

CLIMO 

.295 

.659 

.135 

.547 

.471 

.478 

.580 

7     EHF 

.337 

.693 

.135 

.479 

.529 

.415 

.527 

SEHF 

.319 

.685 

.135 

.495 

.523 

.417 

.537 

FTER 

.320 

.675 

.135 

.515 

.523 

.417 

.537 

CLIMO 

.297 

.661 

.135 

.543 

.435 

.528 

.602 

RH 

.314 

.659 

.135 

.54  7 

.308 

.654 

.730 

8     EHF 

.338 

.688 

.135 

.489 

.491 

.467 

.551 

SEHF 

.320 

.681 

.135 

.503 

.478 

.475 

.569 

FTER 

.320 

.680 

.135 

.505 

.553 

.377 

.517 

CLIMO 

.301 

.663 

.135 

.539 

.404 

.567 

.625 

RH 

.315 

.657 

.135 

.551 

.508 

.441 

.543 

TABLE    III     (CONT.) 


10 


EHF 

.340 

.693 

.135 

.479 

.522 

.425 

.531 

SERF 

.322 

.686 

.135 

.493 

.514 

.429 

.543 

FTER 

.324 

.683 

.135 

.499 

.574 

.349 

.503 

CLIMO 

.299 

.663 

.135 

.539 

.443 

.516 

.598 

RH 

.315 

.657 

.135 

.551 

.476 

.482 

.566 

EFH 

.341 

.696 

.135 

.473 

.491 

.467 

.551 

SEHF 

.323 

.688 

.135 

.489 

.534 

.401 

.531 

FTER 

.322 

.678 

.135 

.509 

.539 

.396 

.526 

CLIMO 

.300 

.662 

.135 

.541 

.418 

.549 

.615 

RH 

.316 

.658 

.135 

.549 

.508 

.441 

.543 
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TABLE  IV.   FIRST-STAGE  CONTINGENCY  TABLE  STATISTICS 
AO,  TS1,  AAO  AND  ATS1  FOR  BOTH  DEPENDENT 
AND  INDEPENDENT  NORTH  PACIFIC  OCEAN,  JULY 
19  79,  DATA,  FOR  EPI'S  OF  FOUR  THROUGH  TEN 
AND  THE  MAXIMUM-PROBABILITY  STRATEGY,  WITH 
EHF  AS  THE  FIRST  PREDICTOR  FOR  EACH  NUMBER 
OF  EPI  'S 


Dependent  data  Independent  data 


EPI 

AO 

TS1 

AAO 

ATS1 

A0 

TS1 

AAO 

ATS1 

4 

.684 

.36 

.113 

.17 

.686 

.34 

.087 

.16 

5 

.697 

.35 

.150 

.17 

.695 

.33 

.114 

.15 

6 

.695 

.32 

.145 

.13 

.696 

.30 

.117 

.12 

7 

.693 

.30 

.139 

.10 

.693 

.28 

.107 

.09 

8 

.688 

.27 

.126 

.06 

.694 

.27 

.110 

.08 

9 

.693 

.36 

.139 

.17 

.695 

.34 

.114 

.16 

10 

.696 

.35 

.149 

.17 

.695 

.33 

.114 

.15 
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TABLE  V.   FD(96),  FD,  RSS  FD  AND  aQ  FOR  STRATEGY 

MAXPROB2,  NORTH  PACIFIC  OCEAN,  JULY  19  79, 
DEPENDENT  DATA,  FOR  THOSE  PREDICTORS 
SELECTED  AT  EACH  STAGE  OF  THE  DEVELOPMENTAL 
MODEL  USING  FIVE  EPI'S.   FD(9  6)  IS  COM- 
PUTED FROM  100  RANDOMLY  GENERATED  DATA  SETS, 
AS  EXPLAINED  IN  APPENDIX  A,  AND  PROVIDES 
A  MEASURE  OF  HOW  MUCH  ADDITIONAL  PREDICTA- 
BILITY MAY  BE  EXPECTED  FROM  THE  INCLUSION 
OF  A  NEW  PREDICTOR.   IDEALLY,  RSS  FD 
SHOULD  BE  LESS  THAN  FD(9  6) 


FD,  of  predictor  added,  on 


Predictor 
added 

FD(96) 

I 
EHF 

DDWW 

H510 

1 

RH 

RSS  FD 

ao 

EHF 

.697 

DDWW 

.1399 

.1494 

- 

- 

- 

.1494 

.699 

H510 

.1978 

.2488 

.2185 

- 

- 

.3311 

.704 

RH 

.2423 

.2606 

.2087 

.1515 

- 

.3666 

.746 

THF 

.2798 

.32  90 

.1464 

.1678 

.1907 

.4408 

.820 

CLIMO 

.3128 

.3558 

.1727 

.1823 

.2551 

* 

.882 

RSS  FD  was  not  computed  for  CLIMO  as  the  choice  for 
the  sixth  predictor  was  between  only  CLIMO  and  SEHF. 
It  was  more  economical  to  compute  contingency  table 
statistics  for  each  and  to  choose  the  best  predictor 
from  those  results. 
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TABLE  VI.   CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FOR 
BOTH  DEPENDENT  (3682  OBSERVATIONS)  AND 
INDEPENDENT  (1841  OBSERVATIONS)  NORTH  PACIFIC 
OCEAN,  JULY  19  79,  DATA,  FROM  STAGE  FOUR  OF 
THE  DEVELOPMENTAL  MODEL.   PREDICTORS  ARE  EHF , 
DDWW,  H510  AND  RH ,  EACH  DIVIDED  INTO  FIVE 
EPI'S,  FOR  (A)  MAXPROB1,  (B)  MAXPROB2  AND 
(C)  NATURAL-REGRESSION 


(a)   MAXPROB1 


DEPENDENT    DATA 


3 

< 

U  2 
ID 

tr 

O 
u. 
1 


316 

301 

2198 

29 

79 

29 

471 

118 

141 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


< 
o 

uj  2 
£Z 

O 


175 

162 

1065 

24 

26 

35 

189 

58 

107 

AO  = 

.75 

AAO  = 

.29 

A1  = 

.13 

TS1  = 

.44 

ATS1  = 

.28 

TS2  = 

.14 

ATS2  = 

.01 

TS12  = 

.37 

ATS12  = 

.02 

AO  = 

.70 

AAO  = 

.12 

A1  = 

.15 

TS1  = 

.34 

ATS1  = 

.17 

TS2  = 

.09 

ATS2  = 

-.06 

TS12  = 

.28 

ATS12= 

-.10 

1  2  3 

OBS  ER  VE  D 


92 


TABLE  VI  (CONT.) 


(b)  MAXPROB2 


DEPENDENT    DATA 


3 

\- 

< 
O  2 

LU 

O 
LL 
1 


228 

238 

2077 

25 

108 

63 

563 

152 

228 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

i- 

cn 
< 

o 

uj  2 

cr 

O 

LL 
1 


135 

136 

1007 

23 

29 

48 

230 

81 

152 

AO  = 

.75 

AAO  = 

.29 

A1  = 

.13 

TS1  = 

.47 

ATS1  = 

.32 

TS2  = 

.18 

ATS2  = 

.06 

TS12  = 

.42 

ATS12  = 

.10 

AO  = 
A1  = 
TS1  = 
TS2  = 
TS12  = 


.69 

.16 
.37 
.09 
.31 


AAO=         .09 


ATSU       .20 
ATS2=      -.05 


ATS12=    "-05 


1  2  3 

OBS  ER  VE  D 
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TABLE    VI     (CONT.) 


(c)    Natural-Regression 


DEPENDENT     DATA 


3 

H 
(f) 
< 

O  2 

LU 

tr 
O 

LL 
1 


75 

171 

1773 

501 

• 

279 

565 

240 

48 

30 

AO=  .62 
A1=  .35 
TS1=  .27 
TS2=  >18 
TS12=.27 


AAO=      -.0  6 


ATS1=       -06 

ATS2=       .05 
ATS12=_.13 


1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

< 

o 

uj  2 

tr 

O 


1 


72 

91 

857 

226 

128 

298 

90 

27 

52 

AO  = 
A1  = 

TS1  = 
TS2  = 
TS12  = 


.58 
.35 
.19 
.17 

.22 


AAO=       -.21 


ATS1  = 
ATS2  = 
ATS12 


-.02 

.04 

=    -.19 


1  2  3 

OBS  ER  VE  D 
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TABLE  VII.   LINEAR-REGRESSION  EQUATION  FOR  THE  PREDICTED 
VALUE  OF  THE  VISIBILITY  CATEGORY  (Y)  ,  Y 
STATISTICS  WITH  RESPECT  TO  THE  ACTUAL  VISI- 
BILITY CATEGORIES  (Y)  AND  THRESHOLD  VALUES 
FROM  THE  EQUAL-VARIANCE  ASSUMPTION  MODEL, 
NORTH  PACIFIC  OCEAN,  JULY  19  79.   NOTATION 
IS  AS  IN  APPENDIX  B. 


y   =   3.78586  +  .04118(EHF)  -  .91412(FTER)  -  .01592(RH) 


Class  conditional  distributions  (i.e.,  distribution  of  y  for 
a  given  y) . 


Number  of      Frequency    Mean  Value    Standard 
observations    of          of           deviation  of 
of  y y  (p) y  (m) y  (a) 


.348 
.382 
.353 


1 

816 

.222 

2.077  (m1) 

2 

498 

.135 

2.263  (m2) 

3 

2368 

.64  3 

2.568  (m3) 

T,  =  threshold  between  y  =  1  and  y  =  2  =  2.506 
T~  =  threshold  between  y  =  2  and  y  =  3  =  1.768 
T-   =   threshold  between  y  =  1  and  y  =  3  =  2.048 


State  conditional  distributions  for  visibility  category  I 
(y  =  1) ,  II  (y  =  2)  and  III  (y  =  3)  depicting  threshold 
values  and  means. 
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TABLE    VII     (CONT.) 


.0751 


.050 


c 

© 


.025- 


0 
1.0 


175 


2T0  2!5 

Predicted  value  (y) 


3T0 


T.5 
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TABLE  VIII . 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FROM  LINEAR  REGRESSION,  FOR  BOTH  DEPENDENT 
(36  82  OBSERVATIONS)  AND  INDEPENDENT  (1841 
OBSERVATIONS)  NORTH  PACIFIC  OCEAN,  JULY 
19  79,  DATA 


DEPENDENT    DATA 


CO 

< 

O  2 

UJ 

a 
O 

1 


389 

342 

2131 

0 

0 

0 

427 

156 

237 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

H 

< 
O 

uj  2 
Q£ 
O 


1 


189 

176 

1076 

0 

0 

0 

199 

70 

131 

AO=      .69 

A1=        .14 
TS1=     .35 

TS2=0.0 
TS12= .28 


AAO  =      -14 


ATS1=     .17 
ATS2=--16 

ATS12=-.13 


AO=       .69 

AAO  = 

.11 

A1-         .13 

TS1-      .34 

ATS1  = 

.16 

TS2=0.0 

ATS2  = 

-.15 

TS12=    .26 

ATS12= 

-.13 

1  2  3 

OBS  ER  VE  D 
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TABLE  IX.   THE  INITIAL  FIVE  BEST  PREDICTORS  FOR  EPI'S 
OF  FOUR  THROUGH  TEN,  FOR  EACH  STRATEGY, 
WITH  ASSOCIATED  PP,  aQ  ,  a±    AND  CE  VALUES 
FROM  THE  NORTH  ATLANTIC  OCEAN  AREA  3W 
DEPENDENT  DATA,  15  MAY-15  JULY  19  83, 
WITHOUT  LINEAR-REGRESSION  EQUATIONS  AS 
PREDICTORS 


Maximum-probability 


Natural -regress ion 


EPI   Predictor 


PP 


CE 


CE 


4 

E850 

.372 

.697 

.125 

.482 

.514 

.446 

.526 

SHF 

.376 

.691 

.125 

.493 

.512 

.455 

.521 

DTDP 

.344 

.685 

.125 

.505 

.611 

.304 

.474 

E925 

.359 

.685 

.125 

.505 

.505 

.453 

.537 

SMF 

.334 

.682 

.125 

.511 

.606 

.301 

.487 

5 

E925 

.367 

.702 

.125 

.472 

.564 

.379 

.494 

E850 

.375 

.700 

.125 

.475 

.576 

.370 

.478 

DTDP 

.344 

.699 

.125 

.477 

.528 

.409 

.535 

SHF 

.379 

.698 

.125 

.479 

.567 

.383 

.483 

SMF 

.337 

.686 

.125 

.503 

.526 

.409 

.539 

6 

DTDP 

.353 

.710 

.125 

.456 

.568 

.360 

.503 

E850 

.  374 

.699 

.125 

.477 

.609 

.324 

.458 

SMF 

.341 

.699 

.125 

.477 

.563 

.360 

.514 

E925 

.363 

.695 

.125 

.485 

.595 

.334 

.476 

SHF 

.374 

.693 

.125 

.489 

.512 

.455 

.521 

7 

DTDP 

.356 

.716 

.125 

.443 

.514 

.429 

.542 

SMF 

.348 

.706 

.125 

.463 

.590 

.325 

.495 

E850 

.379 

.699 

.125 

.477 

.561 

.389 

.489 

E925 

.364 

.692 

.125 

.491 

.547 

.400 

.506 

SHF 

.376 

.691 

.125 

.493 

.548 

.407 

.497 

8 

SMF 

.352 

.714 

.125 

.448 

.543 

.386 

.528 

DTDP 

.356 

.712 

.125 

.451 

.611 

.304 

.474 

E850 

.378 

.700 

.125 

.475 

.588 

•  J  J  ^ 

.469 

SHF 

.379 

.691 

.125 

.493 

.512 

.455 

.521 

E925 

.364 

.685 

.125 

.505 

.577 

.360 

.486 

98 


TABLE    IX     (CONT.) 


10 


SMF 

.352 

.714 

.125 

.448 

.563 

.360 

.514 

DTDP 

.351 

.708 

.125 

.459 

.568 

.360 

.504 

SHF 

.382 

.700 

.125 

.475 

.541 

.417 

.501 

E850 

.376 

.699 

.125 

.477 

.550 

.402 

.498 

E925 

.369 

.699 

.125 

.477 

.537 

.414 

.512 

SMF 

.357 

.719 

.125 

.437 

.526 

.409 

.539 

DTDP 

.354 

.710 

.125 

.455 

.581 

.341 

.497 

E925 

.369 

.702 

.125 

.471 

.564 

.379 

.493 

E850 

.380 

.700 

.125 

.475 

.576 

.370 

.478 

SHF 

.381 

.698 

.125 

.479 

.567 

.383 

.483 
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TABLE  X.   FIRST-STAGE  CONTINGENCY  TABLE  STATISTICS  AO , 
TS1,  AAO  AND  ATSl  FOR  BOTH  DEPENDENT  AND 
INDEPENDENT  NORTH  ATLANTIC  OCEAN  AREA  3W, 
15  MAY-15  JULY  19  83,  DATA,  FOR  EPI'S  OF  FOUR 
THROUGH  TEN  AND  THE  MAXIMUM- PROBABILITY 
STRATEGY,  WITHOUT  LINEAR-REGRESSION  EQUATIONS 
AS  PREDICTORS 


Dependent 

Independent 

Best 

EPI 

Predictor 

AO 

TS1 

AAO 

ATSl 

A0 

TS1 

AAO 

ATSl 

4 

E850 

.70 

.32 

.05 

.15 

.69 

.30 

-.01 

.14 

5 

E925 

.70 

.30 

.06 

.13 

.71 

.30 

.05 

.14 

6 

DTDP 

.71 

.32 

.09 

.15 

.71 

.29 

.05 

.13 

7 

DTDP 

.72 

.31 

.11 

.14 

.71 

.28 

.07 

.11 

8 

SMF   . 

.71 

.28 

.10 

.10 

.73 

.29 

.13 

.13 

9 

SMF 

.71 

.26 

.10 

.08 

.73 

.26 

.11 

.09 

10 

SMF 

.71 

.26 

.09 

.08 

.73 

.24 

.15 

.06 
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TABLE  XI.   SAME  AS  TABLE  IX,  EXCEPT  WITH  LINEAR- 
REGRESSION  EQUATIONS  AS  PREDICTORS 


Maximum-probab  il i ty 


Natural -rearess ion 


EPI 


Predictor 

PP 

"0 

~1 

CE 

BM1 

.443 

.753 

.125 

.370 

BM3 

.427 

.742 

.125 

.392 

BM2 

.395 

.713 

.125 

.450 

BM7 

.389 

.705 

.125 

.465 

E850 

.372 

.697 

.125 

.482 

BM1 

.438 

.749 

.125 

.377 

BM3 

.433 

.749 

.125 

.377 

BM2 

.400 

.727 

.125 

.421 

BM7 

.396 

.716 

.125 

.444 

E925 

.367 

.702 

.125 

.472 

BM1 

.449 

.752 

.125 

.372 

BM3 

.433 

.74  6 

.125 

.383 

BM7 

.404 

.725 

.125 

.425 

BM2 

.399 

.723 

.125 

.429 

DTDP 

.353 

.710 

.125 

.456 

BM1 

.452 

.745 

.125 

.385 

BM3 

.434 

.740 

.125 

.394 

BM2 

.406 

.728 

.125 

.419 

BM7 

.404 

.721 

.125 

.434 

DTDP 

.356 

.716 

.125 

.443 

BM1 

.453 

.753 

.125 

.370 

BM3 

.441 

.742 

.125 

.392 

BM2 

.405 

.724 

.125 

.427 

BM7 

.406 

.723 

.125 

.429 

SMF 

.352 

.714 

.125 

.448 

CE 


662 

.282 

.394 

665 

.270 

.400 

516 

.455 

.512 

512 

.461 

.515 

514 

.446 

.526 

589 

.380 

.442 

590 

.374 

.446 

566 

.387 

.482 

564 

.393 

.480 

564 

.379 

.494 

628 

.332 

.413 

625 

.328 

.422 

604 

.338* 

.453 

517 

.454 

.512 

568 

.360 

.503 

650 

.303 

.39  7 

575 

.393 

.457 

554 

.406 

.486 

480 

.505 

.536 

514 

.429 

.542 

606 

.358 

.431 

601 

.358 

.440 

585 

.364 

.466 

575 

.378 

.472 

543 

.386 

.528 
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TABLE    XI     (CONT.) 


10 


BM1 

.453 

.752 

.125 

.372 

.689 

.250 

.372 

BM3 

.442 

.744 

.125 

.387 

.685 

.248 

.381 

BM7 

.410 

.723 

.125 

.430 

.540 

.427 

.493 

BM2 

.405 

.721 

.125 

.4  34 

.547 

.414 

.491 

SMF 

.352 

.714 

.125 

.448 

.563 

.360 

.514 

BM1 

.456 

.749 

.125 

.377 

.  704 

.235 

.356 

BM3 

.444 

.749 

.125 

.377 

.647 

.301 

.404 

BM2 

.411 

.727 

.125 

.421 

.576 

.377 

.471 

BM7 

.407 

.721 

.125 

.433 

.564 

.393 

.480 

SMF 

.357 

.719 

.125 

.438 

.526 

.409 

.539 
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TABLE  XII.   SAME  AS  TABLE  X,  EXCEPT  WITH  LINEAR- 
REGRESSION  EQUATIONS  AS  PREDICTORS  AND 
BM1  IS  THE  PREDICTOR  FOR  EACH  NUMBER 
OF  EPI'S 


Dependent 

Independent 

EPI 

A0 

TS1 

AAO 

ATS1 

A0 

TS1 

AAO 

ATS1 

4 

.75 

.45 

.22 

.32 

.74 

.43 

.17 

.30 

5 

.75 

.42 

.21 

.28 

.75 

.41 

.17 

.28 

6 

.75 

.41 

.22 

.27 

.75 

.40 

.18 

.26 

7 

.75 

.37 

.20 

.22 

.75 

.39 

.19 

.25 

8 

.75 

.45 

.22 

.32 

.74 

.43 

.17 

.30 

9 

.75 

.44 

.22 

.31 

.75 

.42 

.18 

.29 

10 

.75 

.42 

.21 

.28 

.75 

.41 

.17 

.28 
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TABLE  XIV.   FD(96),  FD,  RSS  FD  AND  a0  FOR  STRATEGY 

1MAXPROB2,  NORTH  ATLANTIC  OCEAN  AREA  3W,  15 
MAY-15  JULY  19  83,  DEPENDENT  DATA,  WITHOUT 
LINEAR-REGRESSION  EQUATIONS  AS  PREDICTORS, 
FOR  THOSE  PREDICTORS  SELECTED  AT  EACH  STAGE 
OF  THE  DEVELOPMENTAL  MODEL  USING  FIVE  EPI'S. 
FD(9  6)  IS  COMPUTED  FROM  10  0  RANDOMLY  GENERATED 
DATA  SETS,  AS  EXPLAINED  IN  APPENDIX  A,  AND 
PROVIDES  A  MEASURE  OF  HOW  MUCH  ADDITIONAL 
PREDICTABILITY  MAY  BE  EXPECTED  FROM  THE 
INCLUSION  OF  A  NEW  PREDICTOR.   IDEALLY,  RSS 
FD  SHOULD  BE  LESS  THAN  FD(9  6) . 


FD,  of  predictor  added,  on 
f— 1 


Predictor 


Added     FD(96)   E925    U700    DVDP    STRTFQ   ENTRN   RSS  FD    a0 


E925        -  -       -        -  .702 

U700  .1518  .1510  -  -       -  .1510  .706 

DVDP  .2147  .1581  .1494  -  .2175  .733 

STRTFQ  .2629  .1557  .1904  .1427  -       -  .2844  .813 

ENTRN  .3036  .1665  .1556  .1734  .1387    -  .3178  .918 

PS  .3394  .1897  .1779  .1492  .1971  .1495  .3887  .950 
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TABLE  XVII. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FOR 
BOTH  DEPENDENT  (1526  OBSERVATIONS)  AND  INDE- 
PENDENT (762  OBSERVATIONS)  NORTH  ATLANTIC 
OCEAN  AREA  3W,  15  MAY-15  JULY  19  83,  DATA, 
WITHOUT  LINEAR-REGRESSION  EQUATIONS  AS 
PREDICTORS,  FROM  STAGE  FIVE  OF  THE  DEVELOP- 
MENTAL MODEL.   PREDICTORS  ARE  SMF ,  D850, 
RH,  UBLW  AND  ENTRN ,  EACH  DIVIDED  INTO  EIGHT 
EPI'S,  FOR  (a)  MAXPROB1,  (b)  MAXPROB2  AND 
(c)  NATURAL-REGRESSION 


DEPENDENT     DATA 


(a)       MAXPROB1 


3 

8 

11 

1039 

h- 

U) 

< 

o  7 

5 

178 

0 

111 

cr 

O 

1 

283 

1 

1 

1  2  3 

OBSERVED 


AO=  .98 
A1=  .01 
TS1  =  .95 
TS2=-91 
TS12=  .95 


AAO =       .95 


ATS1=     .94 


ATS2=     -90 


ATS12=  *92 


INDEPENDENT     DATA 


< 
o 

O 

LL 

1 


68 

61 

452 

9 

21 

38 

64 

12 

37 

AO  = 

.70 

AAO  = 

.04 

A1  = 

.16 

TS1  = 

.34 

ATS1  = 

.19 

TS2  = 

.15 

ATS2  = 

.03 

TS12  = 

.27 

ATS12= 

-.05 

1  2  3 

OBSERVED 
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TABLE  XVII  (CONT.) 


(b)   MAXPROB2 


DEPENDENT    DATA 


< 
O  2 

HI 

cr 
O 

LL. 

1 


0 

0 

1021 

0 

183 

10 

296 

7 

9 

1  2  3 

OBS  E  RVED 


INDEPENDENT     DATA 


3 

54 

52 

408 

< 

O 

uj  2 

cr 

O 

LL 

1 

14 

23 

57 

73 

19 

62 

1  2  3 

OBS  ERVED 


AO  = 
A1  = 
TS1  = 
TS2  = 
TS12  = 


.98 

.01 
.95 
.92 
.95 


AAO  = 


95 


ATS1=  .94 

ATS2=-90 
ATS12=52 


AO  = 

.66 

AAO  = 

-.10 

A1  = 

.19 

• 

TS1  = 

.33 

ATS1  = 

.18 

TS2  = 

.14 

ATS  2  = 

.02 

TS12  = 

.27 

ATS12= 

-.05 
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TABLE    XVII     (CONT.) 


(c)      Natural-Regression 


DEPENDENT    DATA 


< 

O  2 

UJ 

q: 
O 

LL 
1 


0 

10 

1031 

15 

179 

9 

281 

1 

0 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

54 

56 

407 

I- 

</) 

< 

O 

uj  2 

30 

28 

91 

O 

LL 

1 

57 

10 

29 

1  2  3 

OBS  ERVED 


AO=  .98 
A1=  .02 
TS1=  -95 
TS2=  .84 
TS12= .93 


AAO=      .9  3 


ATS1=  .93 
ATS2=  .81 
ATS12=  .90 


AO  = 

.65 

AAO  = 

-.15 

A1  = 

.25 

TS1  = 

.32 

ATS1  = 

.16 

TS2  = 

.13 

ATS2  = 

.01 

TS12  = 

.24 

ATS12= 

-.10 
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TABLE  XVIII.   SAME  AS  TABLE  XVII,  EXCEPT  FOR  FIVE 

EPI'S.   PREDICTORS  ARE  E925,  U700 ,  DVDP, 
STRTFQ  AND  ENTRN 


(a)   MAXPROB1 


DEPENDENT    DATA 


en 

< 
O  2 

HI 

cc 

O 
u. 

1 


36 

49 

1027 

21 

135 

4 

239 

6 

9 

1  2  3 

OBSERVED 


AO=  .92 
A1=  .05 
TS1=  .77 
TS2=  -63 
TS12=.75 


AAO  =       .74 


ATS1=  .71 
ATS2=  -57 
ATS12= .63 


INDEPENDENT    DATA 


3 

54 

60 

460 

f- 

co 

< 

o 

uj  2 

19 

20 

27 

O 

UL 

1 

68 

14 

40 

1  2  3 

OBS  ER  VE  D 


AO  = 

.72 

AAO  = 

.09 

A1  = 

.16 

TS1  = 

.35 

ATS1  = 

.20 

TS2  = 

.14 

ATS2  = 

.02 

TS12  = 

.29 

ATS12  =  . 

-.02 
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TABLE    XVIII     (CONT.) 


(b)       MAXPROB2 


DEPENDENT    DATA 


3 

< 
O  2 

UJ 

c: 
O 
u. 
1 


11 

12 

970 

2 

14  8 

36 

283 

30 

34 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

cn 

< 
o 

uj  2 

a. 

O 

UL 

1 


43 

49 

426 

12 

21 

44 

86 

24 

57 

1  2  3 

OBSERVED 


AO=  -92 
A1=  .05 
TS1=  .79 
TS2=  .65 
TS12= .78 


AAO  = 


74 


ATS1=  .73 
ATS2=  -60 
ATS12=  .67 


AO  = 

.70 

AAO  = 

.03 

A1  = 

.17 

TS1  = 

.39 

ATS1  = 

.25 

TS2  = 

.14 

ATS2  = 

.02 

TS12  = 

.32 

ATS12= 

.01 
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TABLE    XVIII     (CONT.) 


(c)       Natural -Regress ion 


DEPENDENT    DATA 


3 

< 
O  2 

ID 

K 
O 

LL 
1 


3 

43 

986 

76 

14  2 

54 

217 

5 

0 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

h- 

< 
O 

ui  2 
a 

O 
u. 

1 


41 

52 

424 

39 

31 

75 

61 

11 

28 

AO=  .88 
A1=  .12 
TS1=  «72 
TS2=  .44 
TS12=  .51 


AAO=       -63 


ATS1=  .65 
ATS2=  .36 
ATS12= .28 


AO  = 

.68 

AAO  = 

-.05 

A1  = 

.23 

TS1  = 

.34 

ATS1^ 

.19 

TS2  = 

.15 

ATS2  = 

.03 

TS12  = 

.27 

ATS12= 

-.05 

1  2  3 

OBS  ERVED 
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TABLE  XIX. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FOR  BOTH  DEPENDENT  (1526  OBSERVATIONS)  AND 
INDEPENDENT  (762  OBSERVATIONS)  NORTH  ATLANTIC 
OCEAN  AREA  3W,  15  MAY-15  JULY  19  83,  DATA, 
WITH  LINEAR-REGRESSION  EQUATIONS  AS  PREDICTORS, 
FROM  STAGE  FOUR  OF  THE  DEVELOPMENTAL  MODEL. 
PREDICTORS  ARE  BMl ,  U850 ,  D500  AND  V850, 
EACH  DIVIDED  INTO  FOUR  EPI'S,  FOR  (a)  MAXPROBl, 
(b)  MAXPROB2  AND  (c)  NATURAL-REGRESSION 


(a)   MAXPROBl 


DEPENDENT    DATA 


3 

i- 
</) 
< 
O  2 

LU 

O 

Li. 
1 


97 

120 

990 

6 

21 

5 

193 

49 

45 

1  2  3 

OBSERVED 


AO  = 

.79 

AAO  = 

.34 

A1  = 

.12 

TS1  = 

.50 

ATS1  = 

.37 

TS2  = 

.10 

ATS2  =  - 

.02 

TS12  = 

.40 

ATS12  = 

.12 

INDEPENDENT    DATA 


3 

i- 

< 
o 

lu  2 

tr 

O 

LL 
1 


45 

74 

499 

4 

5 

4 

92 

15 

24 

1  2  3 

OBS  ERVED 


AO=        .78  AAO=        .29 


A1=         .13 


TS1=      .51  ATS1=       .40 


TS2=      .05  ATS2=    -.09 


TS12=    >37  ATS12=     >09 
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TABLE    XIX    (CONT.) 


(b)       MAXPROB2 


DEPENDENT    DATA 


< 
O  2 

UJ 

cr 
O 

LL 
1 


77 

109 

967 

3 

21 

9 

216 

60 

64 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


< 
o 

uj  2 
cr 
O 

1 


36 

68 

481 

3 

8 

6 

102 

18 

40 

1  2  3 

OBS  ERVED 


AO=  .79 
A1=  .12 
TS1=  .51 
TS2=  .10 
TS12=  .42 


AAO=       .34 


ATS1=  .40 
ATS2=-.02 
ATS12=  .16 


AO  = 

.78 

AAO  = 

.27 

A1  = 

.12 

TS1  = 

.51 

ATS1  = 

.40 

TS2= 

.08 

ATS2  = 

-.05 

TS12  = 

.39 

ATS12= 

.12 
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TABLE    XIX    (CONT.) 


(c)       Natural -Regress ion 


DEPENDENT    DATA 


< 
O  2 

UJ 

cr 
O 

Li. 
1 


35 

82 

875 

131 

87 

147 

130 

21 

18 

1  2  3 

OBS  E  RVED 


INDEPENDENT    DATA 


3 

24 

49 

427 

I- 

co 

< 

O 

uj  2 

53 

38 

87 

cr 
O 

1 

64 

7 

13 

1  2  3 

OBS  ER  VE  D 


AO=     .72 

A1=  .25 
TS1=  .39 
TS2=  .19 
TS12=.33 


AO  = 
A1  = 
TS1  = 
TS2  = 
TS12  = 


.69 

.26 
.40 
.16 
.30 


AAO=      -11 


ATS1=  .24 
ATS2=  -07 
ATS12=  .02 


AAO=        -01 


ATS1=  .26 
ATS2=  .05 
ATS12=-.0l 
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TABLE    XX.       SAME    AS    TABLE    XIX,     EXCEPT    RESULTS    ARE    FROM 
STAGE    TWO    IN    THE    DEVELOPMENTAL   MODEL   AND 
PREDICTORS    ARE    DIVIDED    INTO    EIGHT    EPI'S 
EACH.       PREDICTORS    ARE    BMl    AND    U500 


(a)       MAXPROB1 


DEPENDENT    DATA 


< 
O  2 

OJ 

cr 
O 


112 

130 

965 

10 

13 

9 

174 

47 

66 

1  2  3 

OBS  E  RVED 


AO=  .75 
A1=  .13 
TS1=     .43 

TS2=  .06 
TS12=.33 


AAO  =      .23 


ATS1=     .29 

ATS2=_.07 
ATS12= .02 


INDEPENDENT    DATA 


3 

< 
u 

lU  2 

cr 

O 

LL 

1 


56 

79 

484 

1 

0 

3 

84 

15 

40 

1  2  3 

OBS  ERVED 


AO=       .75 
A1=         .13 
TS1=      .43 
TS2=0.0 
TS12=    .30 


AAO=        .17 


ATS1=  .30 
ATS2=  -.14 
ATS12="-01 
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TABLE    XX     (CONT.) 


(b)       MAXPROB2 


DEPENDENT    DATA 


en 

< 
O  2 

cr 
O 
u. 
1 


90 

118 

943 

3 

6 

4 

203 

66 

■  93 

1  2  3 

OBSERVED 


NDEPENDENT    DATA 


3 

I- 

46 

76 

470 

</) 

< 

O 

uj  2 

0 

0 

2 

or 
O 

LL 

1 

95 

18 

55 

1  2  3 

OBS  ER  VE  D 


AO=  .75 
A1=  .13 
TS1=  .45 
TS2=  .03 
TS12=.36 


AO=  .74 
A1=  .13 
TS1=      .44 

TS2=0.0 

TS12=   .33 


AAO=      .23 


ATS1=    .31 

ATS2=-.ll 
ATS12=  .06 


AAO=        .16 


ATS1=  -32 
ATS2= -.14 
ATS12=    .02 
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TABLE    XX     (CONT.) 


(c)       Natural-Regression 


DEPENDENT    DATA 


3 

59 

97 

873 

< 

U  2 

UJ 

170 

76 

156 

cr 

O 

u. 

1 

67 

17 

11 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

H 
< 

o 

uj  2 
(X 

O 
u_ 

1 


32 

64 

431 

74 

25 

90 

35 

5 

6 

AO=  -67 
A1=  .29 
TS1=    .21 

TS2=  .15 
TS12=.22 


AAO=     "-OS 


ATS1=  .02 
ATS2=  -03 
ATS12=-.15 


AO  = 

.64 

AAO  = 

-.lb 

A1  = 

.31 

TS1  = 

.23 

ATS1  = 

.06 

TS2  = 

.10 

ATS2  = 

-.03 

TS12  = 

.18 

ATS12= 

-.18 

1  2  3 

OBS  ER  VE  D 
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TABLE  XXI.   LINEAR-REGRESSION  EQUATIONS  FOR  THE  PREDICTED 
VALUE  OF  THE  VISIBILITY  CATEGORY  (Y) ,  FOR  BOTH 
REGRESSION  METHODS,  Y  STATISTICS  WITH  RESPECT 
TO  THE  ACTUAL  VISIBILITY  CATEGORIES  (Y)  AND 
THRESHOLD  VALUES  FROM  BOTH  THRESHOLD  MODELS, 
NORTH  ATLANTIC  OCEAN  AREA  3W,  15  MAY- 15  JULY 
19  83.   NOTATION  IS  AS  IN  APPENDIX  B 


A.   Definitions: 

LRl   -   Linear  regression  method  1:   single  equation, 
three  visibility  categories 

LR2   -   Linear  regression  method  2:   Decision-tree;  two 
equations,  two  visibility  categories  each 

a   -   All  predictors  were  made  available  to  the 
regression  model- 

b   -   Only  the  best  predictors  from  the  Preisendorf er 
(1983  a,b,c)  methodology  were  made  available 
to  the  regression  model 

A  -   Quadratic  threshold  model  (Case  III,  Appendix  B) 

B   -   Equal  variance  threshold  model  (Case  I,  Appendix  B, 


B.   LRla 

y   =   2.81132  +  .1620KEAIR)  -  .00237(E850)  -  .07319(T925) 
-  .16179(E925) 


Class  conditional  distributions  (i.e.,  the  distribution  of  y 
for  a  given  y) . 


Number 

of 

Frequency 

Mean  value 

St 

andard 

observations 

of 

of 

de 

viation  of 

z 

of  y 
296 

y  (p) 

y  (m) 
2.014 

(mx) 

y 

(a) 

1 

.194 

.434 

2 

190 

.125 

2.324 

(m2) 

.379 

3 

1040 

.682 

2.652 

(m3) 

.352 
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TABLE  XXI  (CONT.) 


LRlaA 

T,   =  threshold  between  y 

T2   =  threshold  between  y 

T-,   =  threshold  between  y 


1  and  y  =  2  =  2.275 

2  and  y  =  3  =  1.839 
1  and  y  =  3  =  2. 008 


(second  threshold  value,  of  the  pair,  was  of  no  interest 
See  Appendix  B) 

LRlaB 


T    =   threshold  between  y 

a  J 


?.   =   threshold  between  y  = 


T    =   threshold  between  y 


1 

and 

y 

=  2  =  2.368 

2 

and 

y 

=  3  =  1.768 

1 

and 

y 

=  3  =  2.060 

State  conditional  distributions  for  visibility  category 
I  (y  =  1) ,  II  (y  =  2)  and  III  (y  =  3)  depicting 
threshold  values  and  means. 
075- 


.050- 


(0 

c 
o 
o 


.025 


Predicted  value  (y) 
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TABLE  XXI  (CONT.) 

C.   LR2a 

Equation  1:   y  =  .90305  +  .06122 (EAIR)  +  .11284  xlO~4(D850 

-  .08438(E850)  -  .04083(T925) 

Class  conditiona  distributions 


Number  of 
observations 

y    of  Y 

0  486 

1  1040 


Frequency 

of 

y  (P) 

.318 
.682 


Mean  value 
of 

y  <m) 

.479  (mQ) 
.776  (m1) 


Standard 
Deviation 
of  y  (a) 

.222 
.209 


LR2aA:   T,   =   threshold  between  y  =  0  and  y  =  1  =  .4979 

LR2aB:   T    =   threshold  between  y  =  0  and  y  =  1  =  .5110 
a  j  2 

State  conditional  distributions  for  combined  visibility 
categories  I  and  II  (y  =  0)  and  visibility  category  III 
(y  =  1)  depicting  threshold  values  and  means 


.075i 


.050- 


w 

c 
© 


.025 


1.5 


Predicted  value  (y) 
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TABLE  XXI  (CONT.) 


-3 


Equation  2:   y   =   .01229  -  . 18917  x 10  (U1000 ) 


-  .02088(T500)  +  .1339  x 10_3 (U500 ) 

+  .15259  x 10~4 (D925)  -  . 32705  x 10-2 (STRTFQ) 

+  7.50153(DEDP)  -  .03279 (DVDP) 


Class  conditional  distributions 


y 

Number  of 
observations 
of  y 

Frequer 

of 

Y  (P) 

icy 

Mean  value 

of 

y  (m) 

Standard 
deviation 

of  y  (a) 

0 

296 

.609 

.319  (mQ) 

.186 

l 

190 

.391 

.503  (nu) 

.194 

LR2aA:   T,   =   threshold  between  y  =  0  and  y  =  1  =  .5102 
LR2aB:   T    =   threshold  between  y  =  0  and  y  =  1  =  .49  72 


State  conditional  distributions  for  visibility  category  I 
(y  =  0)  and  II  (y  =  1)  depicting  threshold  values  and  means. 


.075 


050 


to 
c 
o 
Q 


.025 


1.0 


Predicted   value   (y) 
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TABLE  XXI  (CONT.) 

D.   LR2b 

Equation  1:   y   =   .89952  -  .04830(E850)  +  .02472(SHF) 

+  2.1708KDTDP)  +  6.  81684  (DEDP) 


Class  conditional  distributions 


Number  of 
observations 
of  y 


y 

0  486 

1  1040 


Frequency 

of 

y  (p) 

.318 
.682 


Mean  value 

of 

y  (m) 

.496  (m0) 

.768  (m1) 


Standard 
deviation 
of  y  (a) 

.220 

.201 


LR2bA:   T    =   threshold  between  y  =  0  and  y  =  1  =  .4922 


LR2bB :   T    =   threshold  between  y  =  0  and  y 

a  J  J 


=  1  =  .5119 


State  conditional  distributions  for  visibility  categories 
I  and  II  (y  =  0)  and  visibility  category  III  (y  =  1) 
depicting  threshold  values  and  means. 


.075 


.050 


CO 

c 
o 

G 


.025 


0  .5  1.0 

Predicted  value  ($) 


1.5 
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TABLE  XXI  (CONT.) 

Equation  2:   y  =  .71769  +  .11439  xl0~3(v700)  -  .47810  x 10-2 (STRTFQ) 

+  4.5433(DTDP) 


Class  conditional  distributions 


Number  of 
observations 

z       of  y 


296 
190 


Frequency 
of 

y  (p) 

.609 
.391 


Mean  value 

of 

y  (m) 

.337  (m0) 

.476  (mx) 


Standard 
deviation 
of  y  (a) 

.164 
.177 


LR2bA:   T    =   threshold  between  y  =  0  and  y  =  1  =  .520  8 
LRabB:   T    =   threshold  between  y  =  0  and  y  =  1  =  .4978 


State  conditional  distributions  for  visibility  category  I 
(y  =  0)  and  II  (y  =  1)  depicting  threshold  values  and  means. 


.075 


,050 


CO 

c 
a 
Q 


.025 


1.5 


Predicted  value  (y) 


125 


TABLE  XXII. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FROM  LINEAR  REGRESSION  METHOD  1  (SINGLE 
EQUATION) ,  QUADRATIC  THRESHOLD  MODEL,  FOR 
BOTH  DEPENDENT  (152  6  OBSERVATIONS)  AND 
INDEPENDENT  (762  OBSERVATIONS)  NORTH 
ATLANTIC  OCEAN  AREA  3W,  15  MAY-15  JULY  19  83, 
DATA,  WITH  ALL  PREDICTORS  AVAILABLE  TO  THE 
REGRESSION  MODEL 


LRlaA    (Table    XXI) 


DEPENDENT    DATA 


3 

152 

151 

1 

996 

< 
O  2 

UJ 

0 

0 

0 

o 

u. 

1 

• 

14  4 

39 

44 

1  2  3 

OBSERVED 


AO=      .75 
A1=        .12 
TS1=     .38 
TS2=0-0 
TS12=.27 


AAO=       .21 


ATS1=     -23 


ATS2="*14 


ATS12=-.07 


INDEPENDENT    DATA 


3 

i- 

< 

cr 

O 
u. 

1 


69 

80 

498 

0 

0 

0 

72 

14 

29 

1  2  3 

OBS  ER  VE  D 


AO=       .75 
A1=         .12 
TS1=      .39 
TS2=0.0 
TS12=    -27 


AAO=        .18 


ATS1=  .25 
ATS2=  -.14 
ATS12=--05 
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TABLE  XXIII.   SAME  AS  TABLE  XXII,  EXCEPT  USING  THE 
EQUAL-VARIANCE  THRESHOLD  MODEL 


LRlaB  (Table  XXI ! 


DEPENDENT    DATA 


3 

i- 

< 
O  2 

UJ 

cr 
O 

LL. 

1 


135 

147 

984 

0 

0 

0 

161 

43 

56 

1  2  3 

OBSERVED 


AO=        .75  AAO  =      .22 


A1  = 


TS2  = 


.12 


TS1=       .41  ATS1=     .27 


0.0 


ATS2=_.i4 


TS12=     #30  ATS12=_.o3 


INDEPENDENT    DATA 


3 

< 

o 

uj  2 

cr 

O 

LL 
1 


65 

78 

492 

0 

0 

0 

76 

16 

35 

AO=  .75  AAO=        .17 


A1=  .12 


TS1=         .40         ATS1=       .26 


TS2=     0.0  ATS2=   -.14 


TS12=      .28         ATS12=-.04 


1  2  3 

OBS  ER  VE  D 
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TABLE  XXIV. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS 
FROM  LINEAR  REGRESSION  METHOD  2  (DECISION- 
TREE)  ,  QUADRATIC  THRESHOLD  MODEL,  FOR  BOTH 
DEPENDENT  (1526  OBSERVATIONS)  AND  INDEPENDENT 
(762  OBSERVATIONS)  NORTH  ATLANTIC  OCEAN  AREA 
3W,  15  MAY-15  JULY  19  83,  DATA,  WITH  ALL 
PREDICTORS  AVAILABLE  TO  THE  REGRESSION  MODEL 


LR2aA    (Table    XXI) 


DEPENDENT    DATA 


3 

< 
O  2 

til 
a. 

O 
u. 


1 


10  5 

118 

945 

11 

28 

19 

180 

44 

76 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


(ft 
< 

o 
m  2 

- 
O 

LL 
1 


52 

68 

474 

11 

8 

6 

78 

18 

47 

AO=  .76 
A1=  .13 
TS1=  .43 
TS2=  -13 
TS12= -36 


AAO=      .23 


ATS1=     .30 

ATS2=    -00 
ATS12=  -06 


AO  = 

.73 

AAO  = 

.14 

A1  = 

.14 

TS1  = 

.38 

ATS1  = 

.24 

TS2  = 

.07 

ATS2  = 

-.06 

TS12  = 

.30 

ATS12= 

-.01 

1  2  3 

OBS  ER  VE  D 
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TABLE  XXV.   SAME  AS  TABLE  XXIV,  EXCEPT  USING  THE  EQUAL- 
VARIANCE  THRESHOLD  MODEL 


LR2aB    (Table    XXI) 


DEPENDENT    DATA 


< 
O  2 

UJ 
QC 

o 

u. 

1 


96 

116 

938 

15 

30 

26 

185 

44 

76 

AO=         .76  AAO  =      .23 


A1=  .13 


TS1=       .44  ATS1=     -31 


TS2=       -13  ATS2=    *01 


TS12=     .37  ATS12=.Q7 


1  2  3 

OBS  ERVED 


INDEPENDENT    DATA 


3 

49 

67 

464 

I- 

(/) 

< 

O 

uj  2 

12 

9 

13 

cr 

O 
u. 

1 

80 

18 

50 

1  2  3 

OBS  ERVED 


AO=  .73  AAO=        .11 


A1=  .14 


TS1=         .38         ATS1=       .24 


TS2=        .08         ATS2=  -.05 


TS12=      _30         ATS12=_.oi 
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TABLE  XXVI. 


CONTINGENCY  TABLES  AND  RELATED  STATISTICS  FROM 
LINEAR  REGRESSION  METHOD  2  (DECISION-TREE) , 
QUADRATIC  THRESHOLD  MODEL,  FOR  BOTH  DEPENDENT 
(1526  OBSERVATIONS)  AND  INDEPENDENT  (762 
OBSERVATIONS)  NORTH  ATLANTIC  OCEAN  AREA  3W, 
15  MAY -15  JULY  19  83,  DATA,  WITH  ONLY  THOSE 
PREDICTORS  IDENTIFIED  AS  BEST  BY  THE 
PREISENDORFER  METHODOLOGY  AVAILABLE  TO  THE 
REGRESSION  MODEL 


LR2bA  (Table  XXI) 


DEPENDENT    DATA 


3 

116 

127 

952 

f- 

co 

< 

O  2 

5 

10 

13 

LU 

cr 

O 

LL 
1 

175 

53 

75 

1  2  3 

OBSERVED 


AO  =         .75  AAO=         -20 


A1=  -13 


TS1=       .41  ATS1=        .27 


TS2=       .05  ATS2=    -.09 


TS12=     .32  ATS12=     .01 


INDEPENDENT    DATA 


CO 

< 
o 

u  2 

cr 

O 


54 

72 

475 

4 

1 

7 

83 

21 

45 

AO  = 

.73 

AAO  = 

.14 

A1  = 

.14 

TS1  = 

.40 

ATS1  = 

.26 

TS2  = 

.01 

ATS2  = 

-.13 

TS12  = 

.29 

ATS12= 

-.02 

1  2  3 

OBS  ERVED 
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TABLE  XXVII.   SAME  AS  TABLE  XXVI,  EXCEPT  USING  THE 
EQUAL -VARIANCE  THRESHOLD  MODEL 


LR2bB    (Table    XXI) 


DEPENDENT    DATA 


3 

< 
O  2 

UJ 

cr 

O 
u. 
1 


105 

116 

933 

8 

14 

23 

183 

60 

84 

1  2  3 

OBSERVED 


INDEPENDENT    DATA 


3 

< 

o 

iu  2 

cr 

O 

Li. 

1 


51 

71 

465 

5 

3 

10 

85 

20 

52 

1  2  3 

OBS  ER  VE  D 


AO  = 
A1  = 
TS1  = 
TS2  = 

TS12  = 


.74 
.14 
.42 
.06 
.33 


AO  = 


A1  = 


73 


14 


AAO=         .19 


ATS1=       .28 
ATS2=    "*07 


ATS12=     -02 


AAO=  .11 


TS1=      .40  ATS1^  .26 


TS2=     .03  ATS2=      -.11 


TS12=    .30  ATS12=    --02 
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APPENDIX    G 
FIGURES 
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Fig.  2a.   The  behavior  of  contingency  table  statistics 
for  dependent  (A0 — dashes,  TSl--solid)  and 
independent  (A0--chaindots ,  TSl--chaindashes) 
data,  as  the  number  of  EPI's  is  varied,  for 
the  North  Atlantic  Ocean  area  3W,  15  May-15 
July  19  83,  when  predictors  are  chosen  based  upon 
the  maximum  increase  of  ag  in  the  dependent 
data,  for  (a)  a  single  predictor  (SMF) ,  (b)  two 
predictors,  (c)  three  predictors,  (d)  four 
predictors,  and  (e)  five  predictors.   Numbers 
in  parentheses  represent  the  number  of  EPI's 
which  was  fixed  for  the  indicated  parameter  so 
that  the  number  of  EPI's  for  the  next  predictor 
could  be  varied. 
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predictors 
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Fig.    3d-        Same  as    Fig.      2a,         except   predictors,      after    tne 
first,     arc    selected    by  having    the    ijwest    S.SS    YD    tor     (a) 
two   predictors     (S?lF  (6)    and   EH)  ,       {£)       three    predictors, 
(c)    roar    predictors,     (d)     rive    freuictors,      auc    (e)       six 
predictors. 
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Fig.  5.  First  stage  contingency 
dependent  data  (solid) , 
(dashed) ,  North  Pacific 


table  statistics  AAO , 

and  ATS1,  independent  data 

Ocean,  July  19  79,  as  a 


function  of  the  number  of  EPI's,  from  the  Preisen- 
dorfer  (19  83  a,b)  methodology.   EHF  is  the  predictor 
for  all  EPI's. 
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Fig.  6.   Contingency  table  statistics  AAO  and  ATS1  for  both 
dependent  and  independent  North  Pacific  Ocean,  July 
1979,  data  as  a  function  of  the  number  of  predictors 
in  the  model  for  strategies  (a)  MAXPROB1  and  (b) 
MAXPROB2.   Predictors  are  EHF,  DDWW,  H510,  THF  and 
CLIMO,  each  divided  into  five  EPI's.   Negative 
values  are  not  plotted. 
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Fig.  7.   Same  as  Fig.  5,  except  for  the  North  Atlantic 

Ocean  area  3W,  15  May-15  July  19  83.   BM1  is  the 
predictor  for  all  EPI's. 
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Fig.  8.   Behavior  of  ag (96)  (upper  solid) ,  aQ(05)  (lower 

solid)  ,  a-i  (96)  (upper  dashed)  ,  a]_(05)  (lower  dashed)  , 
PP(96)  (upper  dotted)  and  PP(0  5)  (lower  dotted)  from 
100  randomly  generated  data  sets,  using  predictors 
from  the  North  Atlantic  Ocean  area  3W  experiment, 
with  each  predictor  divided  into  four  EPI's,  for  (a) 
as  each  predictor  is  added  and  (b)  as  the  forecast 
array  size  increases  (forecast  array  size,  at  any 
given  stage,  is  equal  to  the  number  of  EPI's  taken 
to  the  nth  power,  where  n  is  equal  to  the  number  of 
predictors  included  at  that  stage) . 
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Fig.  9.   Same  as  Fig.  8,  except  each  predictor  is  divided 
into  eight  EPI's. 
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Fig.  10.   Contingency  table  statistics  AAO  and  ATS1  for  both 
dependent  and  independent  North  Atlantic  Ocean  area 
3W,  15  May-15  July  19  83,  data,  without  linear- 
regression  equations  as  predictors,  as  a  function 
of  the  number  of  predictors  in  the  model  for 
strategies  (a)  MAXPROB1  and  (b)  MAXPROB2.   Pre- 
dictors are  SMF,  D850,  RH ,  UBLW  and  ENTRN,  each 
divided  into  eight  EPI's.   Negative  values  are  not 
plotted. 
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Fig.  11.  Same  as  Fig.  10,  except  predictors  are  E925,  U700, 
DVDP,  STRTFQ,  ENTRN  and  PS,  each  divided  into  five 
EPI  's. 
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Fig.  12.   Contingency  table  statistics  AAO  and  ATSl  for  both 
dependent  and  independent  North  Atlantic  Ocean  area 
3W,  15  May-15  July  1983,  data,  with  linear  regression 
equations  as  predictors,  as  a  function  of  the  number 
of  predictors  in  the  model  for  strategies  (a)  MAX- 
PROB1  and  (b)  MAXPROB2.   Predictors  are  BMl,  U850, 
D500,  V850,  D1000  and  U1000,  each  divided  into  four 
EPI 's. 


14  8 


Ca) 


DEP.DRTR:    RF10-S0LID,   RTS1-DCTS 
1NDEP.DRTR:    RRO-DRSHES,   RTS1-CHRINDRSHES 


2  3 

NUMBER  CF  PREDICTORS 


(b) 


OS  o" 

o 
o 

CO  r^ 
f-=" 

<r 

LJ 

K  ^ 
X  d- 
^~ 


C3  T 

LJ  O" 


DEP.DRTR:  RRO-SOLIO,  RTS1-D0TS 
INDEP.DRTR:  RRQ-DRSHES,  RTS1-CHRINDR5HES 


2  3 

NUMBER  OF  PREDICTORS 


Fig.  13.   Same  as  Fig.  12,  except  predictors  are  BMl,  U500, 
ENTRN,  DVDP  and  BM4 ,  each  divided  into  eight  EPI's 
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Fig.  14.   Bivariate  plot  of  EHF  as  a  function  of  both 
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Fig.    15.      Joint    and  marginal    probabilities    of   VISCAT 's    as    a 
function   of    EPI's    for    EHF. 
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Fig.    16.      Conditional    probabilities   of   VISCAT's   as   a 
function   of    EPI's    for   EHF. 
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for  the  first  EPI  (i  =  1)  of  predictor  EHF. 
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Fig.  19.   Skill  diagram  with  lines  of  constant  a,  +  2a~ 
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Fi^.  24.  Example  of  incremental  marginal  probabili  ties  foz 
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